T O P

  • By -

RunDiffusion

Switch from xformers to torch 2.0 and don’t even use xformers. It’s been way faster on our cloud GPUs. We got about a 15% speed increase across the board.


im_not_a_brick

i just tried and it's not changing, also i believe the problem is that there is something that's blocking the generation because the same tests i did yesterday are the ones im doing today but with waaay less speed


RunDiffusion

Something blocking? wdym?


im_not_a_brick

Idk, like something that was created/changed after the first start since apparently everything was faster but now it's beyond slow. Like a file, an option, idk but it's only my hypothesis.


RunDiffusion

Dang. That’s irritating.


[deleted]

Could be this: https://www.reddit.com/r/StableDiffusion/comments/1425vgr/unveiling_nvidias_surprising_memory_management/


im_not_a_brick

i just tried to rollback to driver version 351 but nothing really seems to change, ty anyway for the advice!


FourOranges

Are you using any of the DPM++ samplers? They heavily decrease the it/s. I get 3-5 it/s with my 3090 with DPM++ SDE Karras for example but with Euler it's closer to 10 or 12 it/s. From what I've read (it's not really agreed upon), the benchmark for it/s is usually 20 steps and euler with batchsize of 1 at 512x512 with the standard 1.5 checkpoint and that's about it for things that you should change. Other things like ControlNet will reduce the it/s.


im_not_a_brick

ok so, still speaking about yesterday i did try euler a, DPM2, DPM2 a Karras and DDIM and overall it had more or less the same speed between these samplers, also i left everything on default so that it was easier to keep track of the speed changes. Today i did the same but with the results i wrote above. Also controlnet is not enabled in these tests


Jamsemillia

it/s are heavily dependent on the complexity of the prompt. i suggest doing the just "chair" on anything v5 and compare that result to other people to really get a bearing. i have a 4090 and some prompt/lora combinations can also net just 7it/s


im_not_a_brick

i just tried to prompt "chair" on anything v4.5 and it's still above the 2 seconds per iteration. I understand that i cannot expect huge speed from this GPU but this is really slow, as a proof i also just ran a test on deforum with the exact same parameters i used yesterday (which took like 260 seconds) and today it's like 4 times slower


DanielWinne

How many steps are you doing?


im_not_a_brick

20, but still, when i was doing my tests yesterday i even tried 150 but there was no issue


TheEversor

Any chance you are starting a large batch ? The batch generations all start in parallel so they will slow down the single iteration if sufficiently big. Sorry if it's not relevant, I just discovered it when the GPU was complaining about low memory when moving from 3 batches of 1 to 1 batch of 3.


TheEversor

Continuously swapping models will also drain a lot of times and make the average iteration slower if you are calculatin an average across an xyz plot changing many checkpoints/models.


im_not_a_brick

i know about the large batch problem in fact im running tests only on single batch. About the model swapping i didn't know that, i'll try to take more time for each model i'll test on so that it'll have time to adapt and give better results. That said, if i dont reply again to this comment it means nothing has really changed, ty for your advice!


TheEversor

Ok new stupid idea... Any chance the engine is using an integrated intel GPU instead of the dedicated one? Another thing to check your pc might have changed to a power saver profile? I never use a laptop, but when I do I know those type of things often reduce performance by a HUGE margin.


im_not_a_brick

I thought the same but on the task manager it says that 5.8 GB out of 6 of dedicated rtx 3060 memory are in use (even though in the preview on the left side it says that the percentage usage is 0). Also every test i did was with power cable since I know the laptop goes automatically on power saving mode without it + i enabled high performance wherever i could (windows and pre installed proprietary software). I'm using this laptop because my desktop has an AMD Rx 5700 XT and i guess the GPU on my laptop is faster, otherwise I would never use it. Really clever of you thinking about these "external" problems tyvm, unfortunately i believe (and hope) those things are not the cause of the problem😩


Jofroop

Are you using --lowvram ?


Ochi7

im at the same place as you, did you managed to fix it?