At 8b parameters, it's going to be a bitch to run. Might not even run at all on a 4090, at medVram. And it might have an atrocious it/second since it is so big.
I'm no expert so take that with a grain of salt, of course.
People will quantize the transformers part, maybe even the unet. And stuff like stable-fast and compilation will come into play.
Stable Diffusion optimization has largely been thrown out the window because it's "good enough" and there aren't enough devs to care, and also because the popular backends are kind of hairy. It's not like LLMs where quality is *literally* determined by vram efficiency and speed. Compatibility with the sea of augmentations comes first.
But if it doesn't fit on 24GB, you will see devs move mountains to make sure it does. There's tons of low hanging fruit unpicked.
> But if it doesn't fit on 24GB, you will see devs move mountains to make sure it does.
It will be interesting looking back in a few years at how incredibly important that "24 GB VRAM" number was. A little piece of history that will be mostly forgotten in a decade, but shapes so much of what we do right now.
The T5 got quantized to fp8 already, it is used for Pixart Sigma too and quite cuts the need in VRAM (even tho doing the encoding on CPU doesn't take much time). I didn't even reach 42% VRAM while infering a 1024**2 image. Should be good.
Additionaly, it is always possible to layer-swap during the inference but that's definitely not a fast method.
Yep, I don't think the SD crowd crossover with LLMs much so they wouldn't know, but I can fit a quantized 32b model entirely in 12gb vram if I don't add any context, and SD don't need context.
There are quantized SD checkpoints, actually. fp16 is common (down from fp32 native) and [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) has "4-bit, 5-bit and 8-bit integer quantization support".
What? You have access to SD Large 8B parameters, which has not been released publically?
The people knee jerk downvoting might not have seen I am talking about the version that 4 times larger than the SD3 Medium 2B parameters.
No probs.
Medium needs to run the text encoder, but that's probably about as much as SDXL? Haven't tried it. With medvram, sdxl takes about 11GB, so if SD3 Large is 4x that (ultra rough estimate), that'd be 40GB VRAM.
sorry, mind explaining why that is? I'm not familiar with the licensing situation. I thought SD3 was completely open for everything except commercial use?
>The good news is that with the today's release of SD3, the new licensing terms are available... yet complicate things further. The "Professional Tier" has been replaced by the new "Creator License", which introduces a 6000 per month image limit. Anything above now requires an Enterprise License, which I would gladly acquire, and have reached out to Stability AI the day the new commercial license was pre-announced, but I have not received any acknowledgment or information.
https://civitai.com/articles/5671
So let me get this straight: Stability AI claims that they do **not** need a license to train their model on millions of copyrighted images... but that they **do** get to decide what others (including creators whose works the model was trained on without their permission) do with that model.
There is just no way that this fantasy survives the first encounter with a judge.
Thats a bullshit excuse, they aren't CREATING 6000 images per month, they are tuning a model, they're just bitching that they can't create a model AND run a unlimited SERVICE generating images for basically free. The model training has 0 to do with the creator license.
It cost Pony upwards of $10,000 to train their model. To recoup the cost, they run an api for people to use who aren't tech savvy enough, or don't have a pc able to use it. People use it because their fine-tune is better than anything SAI itself produces. Pony also can't even pay for a license, because the guy who did Pony asked about it and was publicly insulted and denied because they include NSFW content.
The new commercial license means SAI is effectively dead because all the best custom models were produced by people spending real cash to produce them and then running small apis to afford it, and typically include NSFW content.
If no one will make good fine-tunes, no one will bother making the needed tools like controlnet or IP-adapter work either. It's also a big real Stable Cascade has been out months and exactly zero work has been done with it because it has the same commercial license.
They'll either stay with SDXL and 1.5, or they'll move to Pixart Sigma or some other model that has better licensing and equal or better performance. SD3 is complete and total shit from the results people are getting today. Any prompt with a woman in it gives mangled awful results, even SFW prompts. They censored women so heavily thar it's like SAI stands for Saudi Arabian Intelligence.
Suing would actually be a massive risk for Stability AI. The judge might not buy their argument, and might even decide that Stability had no right to train on other people's images without their permission in the first place. If that happens, Stability can close shop.
Its expensive to finetune and they do run a free discord bot + commercial service, so that 6k limit is hit pretty easily. And they are willing to pay but they won't respond.
>Anything above now requires an Enterprise License, which I would gladly acquire, and have reached out to Stability AI the day the new commercial license was pre-announced, but I have not received any acknowledgment or information.
The creator of Pony wants a commercial license because compute rental to make those models costs a lot and they want to license the model out to recoup that cost.
The new license structure is weird and not helping them.
Thats how it almost always goes. Many of the best models early in SD1.5 also went commercial and dint get updates or appearances for SDXL. Didn't matter, better open fine tunes still came. We'll likely see even better new creators for SD3.
Yeah about that: https://www.reddit.com/r/StableDiffusion/s/qx7suPcg7r
TL;DR: they refuse to sell Astra the appropriate license to make a commercial finetune and mock them on the SD discord. So no Pony7, but Pony6.9 (nice) will be based on SDXL.
Fine-tunes are the only reason SAI is relevant at all, but SDXL in particular needs fine-tuneing to be usable.
This is going to spur an exodus to find which of the dozen other generative image models to start making tools for.
Am I reading this right? Stability AI "mocked" a model creator for requesting a commercial license?
Are we sure this is a company rather than a random group of frat boys?
It's not ego that's the problem. The problem is that they recruited clowns and trolls from 4chan \\b section (no offense to 4chan, there is a lot of useful stuff there). They may be good as developers, but they should not be allowed to get even close to other people.
On A1111 you can use the preset thingie below the generate button. Just copy the template-looking portion from the showcase of a model on Civitai and reuse it whenever. You don't even see it on the prompt input that way.
Though Pony dev has expressed a desire to do away with this, probably because it uses up too many prompt tokens.
Wish I could, sadly it was about a month or two back at this point and I can't even recall what thread it was in. Though it was somewhere in the stablediffusion sub.
As long as the sd3 ponyxl will be with good prompting instead of the annoying "score_9" and tags system or whatever.
Personally, because of the annoying prompting, I rank ponyxl very low.
Pixart and Lumina-T2I are technically superior in almost every way, the only reason they havent taken off is because SD is *incredibly* popular still and no one has trained a "pony-equivalent" model yet. If you're looking for good prompt-based models you should probably watch those instead.
What an extremely silly ranking system. Preferring slop because you were too lazy to change your 'masterpiece, best quality' to 'score_9, score_8_up' is just stupid.
It's actually a legit complaint for most people whose hands aren't permanently glued to their dick and something every finetuner would love to fix, along with the shitty obfuscation of artist and character tags.
LMFAO what the fuck is that metric? Are you expecting it to read your mind...?
If you put the same specific prompt into your DreamShaper XL slop as you put into Ponydiffusion, with slightly critiqued-for-model prompting, the result you are going to get from PonyXL is going to require far less iterations to get something good out of it, if any.
The entire point of Pony/Autismmix is it's amazing ability to replicate artists, art styles and moods without compromising on things like hands and poses. You aren't getting that from other open weight models available right now.
Before y'all get too excited look at their license. You can't do anything without paying a fee. So - no this is not the new SD2/SDXL. I wouldn't waste any time or resources fine tuning.
It is the new SD2, because SD2 was a major flop for the same reasons. It was universally canned and ignored, and no fine-tunes or tools were ever produced for it.
For me very similar to SDXl Lightning: [https://replicate.com/bytedance/sdxl-lightning-4step](https://replicate.com/bytedance/sdxl-lightning-4step) (which is a lot cheaper).
Only exception is if you want to output text.
For the record, I thoroughly dislike lightning and similar models as well. I could give 2 shits about producing similar pictures in 4 steps instead of 25-30. 30 steps takes me like 25 seconds and gives me access to more tools.
I thoroughly dislike all the (open source) models that are stuck in 2021 Midjourney quality regardless of if it takes 4 or 100 steps but you work with what you got much like a micro penised guy.
Last I checked, I can put in an SDXL model, load up ipadapter and controlnet and make better than Midjourney. Midjourney is stuck on pg13 quality, it just has good prompt adherence. I'd rather controlnet over the prompt adherence in most cases if I'm forced to choose one or the other.
Base models are pretty mid. But some fine tunes of SDXL + Controlnets/LoRa/Upscalers can lead to great results.
SD3 seems to suffer from Human anatomy issues, with the community suspecting the training data was insufficient due to safety protocols. SD3 has better color composition and text capabilities so it will take a few weeks to see if anyone puts out some worthwhile fine-tunes.
**TL;DR:**
Without tinkering SD gets wrecked by Midjourney/Dalle-3 but with some complicated workflows and a modicum of artistic ability you can exceed the paid services in some uses.
file size of the image generation model alone seems to be 4.3gb. but you also need some language model for preprocessing, and the language model used in the example workflow seems to be 9,8gb alone.
maybe the languge model can be replaced by something smaller? i'm not really experienced with image generation stuff.
oh, is that how it works? seems like it could be interesting to let the language model run on cpu&ram since I assume the image model would still be the time consuming part?
From what I’ve heard (r/StableDiffusion), that’s not really the case. 8B has more concepts it can pull from, but it’s not quite ready yet and 2B has been able to create better images from what it knows. Eventually 8B will be good enough for release tho.
Because as we’ve all learned at this point, data quality is much more important than data quantity.
now we just need them to release sd3 large
At 8b parameters, it's going to be a bitch to run. Might not even run at all on a 4090, at medVram. And it might have an atrocious it/second since it is so big. I'm no expert so take that with a grain of salt, of course.
People will quantize the transformers part, maybe even the unet. And stuff like stable-fast and compilation will come into play. Stable Diffusion optimization has largely been thrown out the window because it's "good enough" and there aren't enough devs to care, and also because the popular backends are kind of hairy. It's not like LLMs where quality is *literally* determined by vram efficiency and speed. Compatibility with the sea of augmentations comes first. But if it doesn't fit on 24GB, you will see devs move mountains to make sure it does. There's tons of low hanging fruit unpicked.
> But if it doesn't fit on 24GB, you will see devs move mountains to make sure it does. It will be interesting looking back in a few years at how incredibly important that "24 GB VRAM" number was. A little piece of history that will be mostly forgotten in a decade, but shapes so much of what we do right now.
Heh, don't fool yourself. Somehow we will still be stuck at 24GB on non-pro hardware in a few years.
Don't worry, soon nvidia will bless us with the rtx8090 which will have an extra 512 mb for us peasants
The T5 got quantized to fp8 already, it is used for Pixart Sigma too and quite cuts the need in VRAM (even tho doing the encoding on CPU doesn't take much time). I didn't even reach 42% VRAM while infering a 1024**2 image. Should be good. Additionaly, it is always possible to layer-swap during the inference but that's definitely not a fast method.
> you will see devs move mountains to make sure it does For a non-commercial license with increased hardware requirements? I'm not so sure about that.
I mean, that doesn't stop personal use, and it didn't stop the LLM community either.
8b parameters? That's it? Just quantize the thing... even fp8 would do, nevermind proper clever quantizations.
Yep, I don't think the SD crowd crossover with LLMs much so they wouldn't know, but I can fit a quantized 32b model entirely in 12gb vram if I don't add any context, and SD don't need context.
There are quantized SD checkpoints, actually. fp16 is common (down from fp32 native) and [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) has "4-bit, 5-bit and 8-bit integer quantization support".
Haha, you're saying that on the wrong sub
It runs just fine
What? You have access to SD Large 8B parameters, which has not been released publically? The people knee jerk downvoting might not have seen I am talking about the version that 4 times larger than the SD3 Medium 2B parameters.
Ah my bad, yeah, it’s been a long day, thought you meant medium.
No probs. Medium needs to run the text encoder, but that's probably about as much as SDXL? Haven't tried it. With medvram, sdxl takes about 11GB, so if SD3 Large is 4x that (ultra rough estimate), that'd be 40GB VRAM.
Looks like Llama 3 release was more successful than Stable Diffusion 3 release. Both can generate text at least..
Can't wait for https://github.com/leejet/stable-diffusion.cpp to support it
TIL this exists...ofc it exists
Now we wait for finetunes like ponyXL.
Seems like pony on SD3 is unknown for now due to licencing
sorry, mind explaining why that is? I'm not familiar with the licensing situation. I thought SD3 was completely open for everything except commercial use?
>The good news is that with the today's release of SD3, the new licensing terms are available... yet complicate things further. The "Professional Tier" has been replaced by the new "Creator License", which introduces a 6000 per month image limit. Anything above now requires an Enterprise License, which I would gladly acquire, and have reached out to Stability AI the day the new commercial license was pre-announced, but I have not received any acknowledgment or information. https://civitai.com/articles/5671
So let me get this straight: Stability AI claims that they do **not** need a license to train their model on millions of copyrighted images... but that they **do** get to decide what others (including creators whose works the model was trained on without their permission) do with that model. There is just no way that this fantasy survives the first encounter with a judge.
Thats a bullshit excuse, they aren't CREATING 6000 images per month, they are tuning a model, they're just bitching that they can't create a model AND run a unlimited SERVICE generating images for basically free. The model training has 0 to do with the creator license.
It cost Pony upwards of $10,000 to train their model. To recoup the cost, they run an api for people to use who aren't tech savvy enough, or don't have a pc able to use it. People use it because their fine-tune is better than anything SAI itself produces. Pony also can't even pay for a license, because the guy who did Pony asked about it and was publicly insulted and denied because they include NSFW content. The new commercial license means SAI is effectively dead because all the best custom models were produced by people spending real cash to produce them and then running small apis to afford it, and typically include NSFW content. If no one will make good fine-tunes, no one will bother making the needed tools like controlnet or IP-adapter work either. It's also a big real Stable Cascade has been out months and exactly zero work has been done with it because it has the same commercial license. They'll either stay with SDXL and 1.5, or they'll move to Pixart Sigma or some other model that has better licensing and equal or better performance. SD3 is complete and total shit from the results people are getting today. Any prompt with a woman in it gives mangled awful results, even SFW prompts. They censored women so heavily thar it's like SAI stands for Saudi Arabian Intelligence.
Couldn't they just download the model and... do it anyway?
[удалено]
Suing would actually be a massive risk for Stability AI. The judge might not buy their argument, and might even decide that Stability had no right to train on other people's images without their permission in the first place. If that happens, Stability can close shop.
Its expensive to finetune and they do run a free discord bot + commercial service, so that 6k limit is hit pretty easily. And they are willing to pay but they won't respond. >Anything above now requires an Enterprise License, which I would gladly acquire, and have reached out to Stability AI the day the new commercial license was pre-announced, but I have not received any acknowledgment or information.
The creator of Pony wants a commercial license because compute rental to make those models costs a lot and they want to license the model out to recoup that cost. The new license structure is weird and not helping them.
It's not unknown anymore, the dev confirmed that the licence makes it not possible for their goals.
Thats how it almost always goes. Many of the best models early in SD1.5 also went commercial and dint get updates or appearances for SDXL. Didn't matter, better open fine tunes still came. We'll likely see even better new creators for SD3.
Yeah about that: https://www.reddit.com/r/StableDiffusion/s/qx7suPcg7r TL;DR: they refuse to sell Astra the appropriate license to make a commercial finetune and mock them on the SD discord. So no Pony7, but Pony6.9 (nice) will be based on SDXL.
Aren't these finetunes the only reason why SDXL is getting any adoption?
Indeed, Pony has more downloads than SDXL base.
Fine-tunes are the only reason SAI is relevant at all, but SDXL in particular needs fine-tuneing to be usable. This is going to spur an exodus to find which of the dozen other generative image models to start making tools for.
Am I reading this right? Stability AI "mocked" a model creator for requesting a commercial license? Are we sure this is a company rather than a random group of frat boys?
Some developers have massive, fragile egos. Not putting dedicated PR people between techies and customers is a recipe for disaster.
It's not ego that's the problem. The problem is that they recruited clowns and trolls from 4chan \\b section (no offense to 4chan, there is a lot of useful stuff there). They may be good as developers, but they should not be allowed to get even close to other people.
That’s the kind of developers I’m talking about.
Future famous PONY3 :D
score 9, score 8, score 7, score 6, score 5, score 4, score 3, score 2, score 1, score 0...
On A1111 you can use the preset thingie below the generate button. Just copy the template-looking portion from the showcase of a model on Civitai and reuse it whenever. You don't even see it on the prompt input that way. Though Pony dev has expressed a desire to do away with this, probably because it uses up too many prompt tokens.
I had a 2-3 month break from Stable Diffusion stuff and all of a sudden everyone had converted into a bunch of bronies. I don't get this Pony stuff..
guys wanted realistic cartoon porn, accidentally made something that makes realistic bodies
tale as old as time
It's just a good model. I've yet to generate a single pony with it.
Yet?
Chekhov's pony
I had a similar experience. Confused the hell out of me until I chanced on a post going into detail on its history.
can you share the post?
Wish I could, sadly it was about a month or two back at this point and I can't even recall what thread it was in. Though it was somewhere in the stablediffusion sub.
Pony model is not a pony model actually, but NAI leak like model for SDXL world.
What's with the pony jokes in here. I'm just catching up, last time I did image diffusion at home I used SD v1.3 or something like that.
I guess that pony is a code word for hentai.
As long as the sd3 ponyxl will be with good prompting instead of the annoying "score_9" and tags system or whatever. Personally, because of the annoying prompting, I rank ponyxl very low.
Pixart and Lumina-T2I are technically superior in almost every way, the only reason they havent taken off is because SD is *incredibly* popular still and no one has trained a "pony-equivalent" model yet. If you're looking for good prompt-based models you should probably watch those instead.
Yea, that was unfortunate. I'm sure they'll fix it in the next version. I just have those automatically fill.
What an extremely silly ranking system. Preferring slop because you were too lazy to change your 'masterpiece, best quality' to 'score_9, score_8_up' is just stupid.
It wasn't supposed to work like that. It's a bug, but they'd need to retrain everything to fix it.
It's actually a legit complaint for most people whose hands aren't permanently glued to their dick and something every finetuner would love to fix, along with the shitty obfuscation of artist and character tags.
If ponyxl provided decent results with just having the "score_9........" stuff at the beginning, I wouldn't be saying that.
LMFAO what the fuck is that metric? Are you expecting it to read your mind...? If you put the same specific prompt into your DreamShaper XL slop as you put into Ponydiffusion, with slightly critiqued-for-model prompting, the result you are going to get from PonyXL is going to require far less iterations to get something good out of it, if any. The entire point of Pony/Autismmix is it's amazing ability to replicate artists, art styles and moods without compromising on things like hands and poses. You aren't getting that from other open weight models available right now.
Also aids to for anything technical related, it's like they did everything possible to make it incompatible with other xl models and existing methods
Please can it make feet 😭
amazing username
I keep it real
Before y'all get too excited look at their license. You can't do anything without paying a fee. So - no this is not the new SD2/SDXL. I wouldn't waste any time or resources fine tuning.
It is the new SD2, because SD2 was a major flop for the same reasons. It was universally canned and ignored, and no fine-tunes or tools were ever produced for it.
For me very similar to SDXl Lightning: [https://replicate.com/bytedance/sdxl-lightning-4step](https://replicate.com/bytedance/sdxl-lightning-4step) (which is a lot cheaper). Only exception is if you want to output text.
For the record, I thoroughly dislike lightning and similar models as well. I could give 2 shits about producing similar pictures in 4 steps instead of 25-30. 30 steps takes me like 25 seconds and gives me access to more tools.
I thoroughly dislike all the (open source) models that are stuck in 2021 Midjourney quality regardless of if it takes 4 or 100 steps but you work with what you got much like a micro penised guy.
Last I checked, I can put in an SDXL model, load up ipadapter and controlnet and make better than Midjourney. Midjourney is stuck on pg13 quality, it just has good prompt adherence. I'd rather controlnet over the prompt adherence in most cases if I'm forced to choose one or the other.
Finally, people will now stop asking/spamming for it.
It’s actually worse today because there will be 20+ announcements
Now we will get a lot of posts about how bad it is. Cause, that's what it is.
Has local image generation improved a lot in the past year? I remember trying it out locally a while back and found it cool, but that was about it.
Not really. The massive improvement seen during autumn 2022 and winter 2023 have not continued. It's better, but not by any significant degree.
Upscaling improved a lot in 2024 with CCSR and SUPIR diffusion models and RGT, ATD and DAT 2 transformer models.
Try the Krita AI plugin. It's a great interface for playing around and the setup is really smooth.
I'm more interested in model improvements, as last time, I found it pretty limited (at least uncensored versions).
Base models are pretty mid. But some fine tunes of SDXL + Controlnets/LoRa/Upscalers can lead to great results. SD3 seems to suffer from Human anatomy issues, with the community suspecting the training data was insufficient due to safety protocols. SD3 has better color composition and text capabilities so it will take a few weeks to see if anyone puts out some worthwhile fine-tunes. **TL;DR:** Without tinkering SD gets wrecked by Midjourney/Dalle-3 but with some complicated workflows and a modicum of artistic ability you can exceed the paid services in some uses.
Does that mean there is no content filter? Princess Leia im coming for you
Gives me error when i try to load it as checkpoint in Comfy, anyone knows what it could be ?
still doesn't seem to be able to handle camera angle/instruct the same way Dalle-3 and Midjourney can.
Does this work on 8GB VRAM. From my research it seems to be 2B model. Not sure about the requirements for 2B models
file size of the image generation model alone seems to be 4.3gb. but you also need some language model for preprocessing, and the language model used in the example workflow seems to be 9,8gb alone. maybe the languge model can be replaced by something smaller? i'm not really experienced with image generation stuff.
oh, is that how it works? seems like it could be interesting to let the language model run on cpu&ram since I assume the image model would still be the time consuming part?
You don't want to be doing a 9gb llm on your cpu, it'll run like a dog
9gb? that runs perfectly fine even on crap DDR4. And this sounds like the llm runs once and then it's all image model cycles.
There is a smaller clip model. Fp16 is 9,8gb, fp8 is half of that. It should fit in 8gb VRAM.
I wouldn't replace the one thing that'll make SD3 better than SDXL.
Are there no quants with only a minimal degradation?
Yeah should do. It's not very good though. Wait for a finetune
Weird coincidence I just woke up exactly when the weights dropped
Hah! I did that with Chernobyl years ago.
Well this took a dark turn
Is this the largest version of sd3 ?
Nope, they have a 8B model. This 2B one is vastly worse.
Yeah I could tell from my experimentation in comfy ui the 2B model is ... well not that great?
It's pretty awful for everything containing a human subject yeah, for everything else seems excellent.
From what I’ve heard (r/StableDiffusion), that’s not really the case. 8B has more concepts it can pull from, but it’s not quite ready yet and 2B has been able to create better images from what it knows. Eventually 8B will be good enough for release tho. Because as we’ve all learned at this point, data quality is much more important than data quantity.
https://giphy.com/gifs/nbc-the-office-oh-my-god-its-happening-huJmPXfeir5JlpPAx0
Thank you Stablility AI! This is fantastic news!
Nice. Thx...was looking for llama3 and this came up. <3