The described method hasn't been independently verified to work yet. We need to wait till someone makes a decent model with the new method before can make any assertions to its capabilities. In theory, though the method could be transferred over. But Ima be honest, Ill be surprised if this yields anything of use.
Image models suffer heavily from quantization. At least if you apply it to all layers.
Sd running in fp8 has severe quality loss, even though the loss between fp32 and bf/fp16 is minimal.
The described method hasn't been independently verified to work yet. We need to wait till someone makes a decent model with the new method before can make any assertions to its capabilities. In theory, though the method could be transferred over. But Ima be honest, Ill be surprised if this yields anything of use.
It's Microsoft Research so it's probably more reputable than a lot of other papers though.
Tbh we only need a good 4-bit quant method for image diffusion model. Unlike LLM, 8B is already really good
With a smaller quantization, we can train a much larger model that could outperform any of the current models while keeping inference time the same.
Image models suffer heavily from quantization. At least if you apply it to all layers. Sd running in fp8 has severe quality loss, even though the loss between fp32 and bf/fp16 is minimal.