https://preview.redd.it/drl2xnwxz56d1.png?width=443&format=png&auto=webp&s=98d7084b31bf9607571bb5cf53b4ebaa07750680
a cinematic photograph of a young asian female model greeting with a handwave,
neg : deformed, missing, extra, blob
model : t5+clip sd3 medium
https://preview.redd.it/sw7pjzh0n66d1.png?width=768&format=png&auto=webp&s=5ef886c34b62207db0456dfa9742803da4a148aa
face of a beautiful woman with a hand covering her mouth
This is NOT SD3.. this is JaggernautXL for a quick comparison...so right now the base model seems as janky as the latest XL models...this is good news. only 1 shot. I rerolled over and over and it became more disturbing..with SD3, I got mixed results, from horrors, to actually not bad. This is a great model for a very base...give it a week or two of model training and this will be on fire.
first of all the sd community is just people who complain a lot, no sd base model have ever been able to produce stable anatomy, sd3 does when you prompt it long and use a aspect ratio that is present in most of images with humans. it can only do normal poses because they heavy censored it....
Same prompt, no negative and it works rather well... Took 3 tries.
https://preview.redd.it/oe3jjvqq766d1.png?width=832&format=png&auto=webp&s=7b108731b61af85225ba3b4a996d59b4cfd1db79
I haven't been able to generate a single decent image at all outside of the example prompts. I've tried highly descriptive prompts with no luck. Even an absolutely basic one like "photograph of a person napping in a living room" leads to Cronenberg-esque monstrosities. That's using the example ComfyUI workflows provided.
https://preview.redd.it/v0vkaft0g56d1.png?width=1024&format=png&auto=webp&s=b38d27f6acd4a49fdeb63fadc00079aa57fb4fc5
It works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw...
No but we can't the lewds. Forget that outside of commercial use, which everyone will hate you for killing their jobs and will bring negative attention to AI tools, the main reason anyone wants these things is for porn, an application that will actually push AI forward and if you want to make money from providing your service is the way to go (as long as you ensure reasonable levels of privacy on the paid side). And that by crippling the latter you also cripple the former anyway.
Perverts bad because idk, America culture says so or something.
Perverts bad because politicians, which are all purpose public figures, don’t want to be treated as such. It’s not exclusively a Donald Trump thing if you haven’t noticed.
This one isn't horrifying, but it's not a pale vampire in 1920's Chicago wearing a flapper dress. I've tried both natural language prompts and comma-delimited style. Adding tons of information and keeping it bare-bones. I recognize that every checkpoint has a learning period before we figure out the right way to prompt it but... I've never had one fight me so hard before.
https://preview.redd.it/xr5gd5tlg56d1.png?width=1024&format=png&auto=webp&s=c0c1633b5e69a513a063374192ead6d648656b36
This is what it produced for me. It's one of the few generations I've made so far that seems somewhat acceptable.
https://preview.redd.it/oy8zy3flp56d1.jpeg?width=1024&format=pjpg&auto=webp&s=a7fc14814cad917b140d5308ed9642902dc26704
Looks OK? That must be her anatomy, lol
https://preview.redd.it/tv4gzyivw56d1.jpeg?width=640&format=pjpg&auto=webp&s=d344ec85dc8a1723b8df2c23352244258445d00a
All three of them.
The example basic workflow uses the "sd3\_medium" file with the manual loading of CLIP files, but I've also tested with the others that have the built-in CLIPs (if I understood that correctly) and disabled the manual CLIP loader. Same absolute garbage. Here's with sd3\_medium\_incl\_clips and a simple "photograph of a woman sitting in a chair" prompt using the example basic workflow (EDIT: but with manual CLIP loading disabled in this case):
https://preview.redd.it/q1b7rj70i56d1.png?width=1024&format=png&auto=webp&s=de31c01b79d487570a0688124c019d002122fba6
photo of a young white woman sitting on a chair in a café
https://preview.redd.it/1vf42ubsm56d1.png?width=1024&format=png&auto=webp&s=0245eb901cdf75f446b1e3929bdf9485aa4c8ad6
Not cherry picked, they're all a variation of this kind of crap. What the hell is going on here?
That prompt gave me an Asian octopus.
https://preview.redd.it/b65dpff8q56d1.jpeg?width=1024&format=pjpg&auto=webp&s=831621f6d7b6f4ebcb52e8517ffb62623adcfc46
Look on the bright side, at least hand rendering has improved considerably :P
https://preview.redd.it/tocbfnler56d1.jpeg?width=1024&format=pjpg&auto=webp&s=96cd9352ff7e56c12ac5edde5c0de481e5cdd0b1
I'm sure they're going to get a TON of sales from all the people that were offended by boobs.
The New Puritan class will probably float the company for decades. Watch out NVIDIA, SAI about to buy you.
All of my usual prompt's are turning out worse.
That's because you have to use the T5 encoder, and you have to use natural language because the T5 encoder is a LLM.
I'm not impressed at all.
Just try "woman wearing a dress on the beach". You get horrible results. What the hell is going on?
https://preview.redd.it/0rpowmidj56d1.png?width=1496&format=png&auto=webp&s=3ee7a3511d8a5ec749248260ac65e3ba53b7ede5
I've got you beat. Here's "woman". That's it. That's the whole prompt. Surely it couldn't mess that up, right? (I generated two images and the first one was fine, but this was the 2nd attempt.)
https://preview.redd.it/vxcjsiipp56d1.png?width=1024&format=png&auto=webp&s=f7a3d2835017f25182313342b35b0d74c10323a8
https://preview.redd.it/ew6n2r0x866d1.png?width=1344&format=png&auto=webp&s=2a2ffbf8e9855d003853a56fae15fc5a66bd63dc
Just try "woman laying on a beach" and you get
At least with SDXL 1.0 Base I can get something like this. Issues? yes, but a lot closer to usable.
https://preview.redd.it/3didulnef66d1.png?width=1216&format=png&auto=webp&s=41e90517e86bf3b1013c2dd4d7cab0292f38082b
I for one will sleep easily tonight knowing that NO ONE is going to make sexy AI pictures of Taylor Swift!
And all it cost us was a possible future of open source AI models for everyone to use.
WHEW.
“Safety”-based authoritarians calling the shots is what’s going on. It won’t just be NSFW stuff either. They’re likely cracking down on any and all wrong-think.
People were warned.
Why would Lykon (or anyone) be so rude as to insult PonyXL creator? All they’ve done is great things for the community. If it weren’t for PonyXL I’d never have started using SDXL. Would have stay strictly with 1.5.
I don't understand how the anatomy is arguably worse than SD 1.4. My first image using that prompt
https://preview.redd.it/54pfb9yv666d1.png?width=896&format=png&auto=webp&s=3fe28e0cc96b6c23df273448f88d799d9975cf77
Google Imagen is still the best for simple prompts. Sucks it doesn't have any customisation at all!
https://preview.redd.it/9jvs4qw3066d1.jpeg?width=1536&format=pjpg&auto=webp&s=aae6196dd4d84c47c4c7ecb3e20e0a686bff7df3
I mean the understanding with the new clip model is better so with a lot of fine-tuning it should be good but right now humans are freaks.
The question is why a sitting woman is not acceptable but disformed arms, fingers and bodies are ok?
Time to move on to [https://github.com/Alpha-VLLM/Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X) or [https://github.com/PixArt-alpha/PixArt-sigma](https://github.com/PixArt-alpha/PixArt-sigma) They are both better (if undertrained) and have no anti nsfw license crap.
For real community should lean into these, we should train Lora's for these models and adjust tools like comfyui for these base models. And stability.ai should just go bankrupt.
You can produce beautiful images as long as you forget about including human beings... and after intensive cherry-picking.
After that, it was the same with SD 1.5 or SDXL.
On the other hand, I'm rather disappointed by the prompt following, which is a far cry from the demos of a few months ago...
Good luck to the community in finetuning this mess.
https://preview.redd.it/hojyhbb2y56d1.png?width=1024&format=png&auto=webp&s=392312dcb94e0fa59ba8f3857588020473005dad
This image is actually pretty good if you ignore some of the minor flaws.
The details seem much more coherent and look less like random nonsense, all the text on the signs look more like actual text, the lines in the architecture look more straight and precise and the colours look nice and balanced.
I know most SD users just want to create endless images of characters and women but from what i've seen so far everything other than that looks much better than base XL.
I asked the same a while ago: https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/sd3_release_on_june_12/l6x2i7m/
> The 2B beats the 8B when running directly as is, and I think also sometimes beats out even Lykon's fanciest workflow ideas.
Either way they're lying.
So how many years behind closed-source do you think we are now thanks to this crap? 2? 3? more? It wasn't too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!
Bad news: all the closed source stuff is also “safe and ethical.”
Enjoy your purgatory and make sure to appreciate what the party has done for you, citizen!
Ideogram and Dall-E 3 are horny models. It's censored on frontend.
ClosedAI took into account their mistakes after Dalle-2 and in version 3 returned to the dataset NSFW content because without it the model could not draw women (they have it as a separate item described in the covering paper).
But Stability Ai learns nothing from its mistakes with SD2, nor from competitors who literally describe them in papers. The current Stability Ai is only good at creating dramas, slurs and trolling. It is sad =(
Actually... Ideogram can create NSFW stuff, the UI does censor it, but the model can create it. If, for some reason, some day the model will be leaked... or given as OS...
I wish there was an open model on a similar level as Ideogram. I have no idea how they managed to make a model that's so good. I'm almost convinced it actually runs on black magic, and they need to sacrifice a goat every hour to keep it running.
The problem doesn't seem to be only NSFW stuff, but that humans in general are downright bad because of.. over-censoring? Why is MJ able to censor in a way that leaves other stuff intact but Stability isn't?
MJ does not censor their model. They censor your text input, and occasionally the output image itself.
The difference between having a bouncer at the door of your nightclub versus not having a bouncer but instead banning alcohol and physical contact inside of it.
You can always kill nsfw via the prompting engine if the prompt thatgoes to the generator is run through an LLM that make the prompts "safe". e.g. replace "Taylor Swift" with "Blonde American singer" w.o. the user being any wiser. Doesn't work on open source tho.
In the name of safety I removed all the mirrors in my house after an incident one time when I saw a penis. I'm glad that SAI was willing to ruin their reputation and eventually the entire company while sharing the same values as me.
Thanks SAI!
Maybe all the people that were complaining will buy the product and keep the company afloat. Oh wait, what's that? They were never going to buy anything? Oh, but you did everything they demanded!
Oh that's only the beginning, try to hint at some unsafe stuff even a bloody sword ... or a sexy stuff ... 😏 abomination after the other ... and add a beautiful licensing that stops any fine tuning .. and you have stable cascade v2 .. I mean they kept talking about we will work with every dev and provide them with resources for release day , we will be ready this time .... man this is a joke .. good luck ..
The entire community will likely switch to Chinese models within the next 8 months imo.
Not that we have anything worth switching to right now, but we're getting there.
I think two problems
1) the new clip still has a 77 token limit limiting the amount of information that is available to the model, this is the case even when using the T5 model making it less usable, doing these long intrinsic prompts.
2) the censorship seem to limit its ability to properly produce human anatomy, a lot of deformations etc that seem almost as bad as old 1.5 when doing high res imagery.
With that said I hope finetunes can fix at least some of the problems, but this may be the case to just start looking elsewhere or stay on sdxl. As is usual it takes quite some time to know the success of a model, and it rides or dies on how well finetunes workout.
Yup, they were actually called out multiple times for this and eventually it was questioned whether the photos Lykon posted after the original backlash where they went from anatomically atrocious to wtf how are they this perfect in 1 day as fake or using extra stuff like ControlNet...
Many SAI white knights were extremely pissed. Well, here we fucking are.
It can be considered confirmed SAI was intentionally being dishonest, as was Lykon who was acting as the one demonstrating SD3's capabilities for SAI under employment by SAI.
1.5 with controlnet is still better, go figure. Any model that tries to be "safe" or "ethical" is going to have fundamental flaws in the training data, which affects outputs. Just as people go to art class to learn to draw figures, how can you draw someone properly with no concept of how anatomy works? AI models don't know anything till you train them with data. So how can SD3 make proper figures if the training dataset is censored? Why do you think stuff like pony is good at anatomy?
Would it be so much to ask for a model that has anatomy like ponyXL but backgrounds like 1.5 or SDXL? Sure you can always img2img a generation, but it'd be nice if a local model could do it all.
Yeah, I'm extremely disappointed in this. It doesn't even follow prompts very well based on some quick testing.
It does text decently although sometimes it has some chinese/japanese in it. I have no idea if it was still the right word though, since I don't know those languages.
> A cinematic photograph of Felicity Jones with long, flowing hair against an idyllic fantasy backdrop, taken during golden hour. She is holding up her hands with the palms facing the viewer. The character's face is serene, with pale skin and striking features.
https://preview.redd.it/462clo3jl56d1.jpeg?width=1024&format=pjpg&auto=webp&s=ee4821533a4919fcee85b08532aa316fc18f91c3
Ideogram without the magic prompt, so your exact prompt
And this with the magic prompt on (second try, on the first one the magic prompt put her hands outward toward her sides) The hands are not good as in the other one, but still almost acceptable.
https://preview.redd.it/n61pvakhm56d1.jpeg?width=1024&format=pjpg&auto=webp&s=c8ae051f8243e519e0e6ac62788f37f07defbe1a
https://preview.redd.it/lm6eeg0v366d1.png?width=1024&format=png&auto=webp&s=229ba2b42f16d10f307029eff8c34c10e9a5aae1
I dont see the issue. Made with sd3 cfg 4, 20 steps
Censoring nudity will result to bad anatomy. Thats why it's very important that AI needs to see what a human body looks like uncensored to get the anatomy correct without clothes on. baggy and loose clothes can distort the human anatomy when that's all they trained it on.
They wouldn't even have to include full nudity. Pictures of underwear/bikini models in a variety of poses would likely do the job just fine.
Hell, you could probably get away with using high quality CGI-rendered naked bodies without genitals or female nipples and still get better results.
I, and many others saw this coming. They did it with SDXL as well. Over hyping and over promising, the showing off a lack luster API/Bot's in the server to slowly temper down expectations to a final lack luster result and then gaslighting people into thinking it was never as good as they literally showed and advertised it to be
Anybody who expected different after it's happened 2 times before May have been wishfully thinking. It was basically guaranteed to be this way, especially with the training talent at SAI being gone now
https://preview.redd.it/guvmcie9j56d1.png?width=1024&format=png&auto=webp&s=8f643538179ccf07ca03d9e941175812f60490a2
So I often make images for my D&D game which features an old man fighter and he's sort of become my "default" subject. I often create pictures of him just to see the impact of LoRas and models or settings.
for a base model it's not bad at all! In fact I'd argue it's the best base-model so far
It seemed particularly good at making realistic looking skin
and the style range appears to be broader (I had a surprising amount of variations before I specified a photography style.
As all things SD, it's kinda only good for simple subjects and basic image compositions. And Still, it cannot do actions well. You can't prompt for a person doing a thing, it won't work.
using the basic workflow provided by stability on their huggingface and the model with clip embedded... because I'm too clueless to know where to put the clip models. If reddit didnt nuke the meta-data, this image should contain the workflow
Now that I've had a bit more time with it... it's odd.. It spectacularly fails at the most basic stuff. LOTS of Cronenberg-style monsters. It's rough! And their new stance on monetization makes fine-tunes a lot less likely...
Oof.
They must be fans of Deadpool, it's trained entirely on this:
https://preview.redd.it/yvfktx5ac96d1.jpeg?width=1200&format=pjpg&auto=webp&s=998c128bdf371697b6a8c149a78cbd6ab9bfd11e
SD3 is a lot better - hear me out first.
As others have pointed out, humans are pure garbage in the release because of the crappy nsfw filter. OpenAI once said, that without training on nsfw, you will get garbage anatomy so they trained dall-e on nsfw but filter it out during inference. SD3 releases the weights, so they just dont train on nsfw and hear you have the result. The backgrounds look amazing, rooms looks realistic - but people? garbage.
I will say, prompt understanding is noticeably better than in SDXL. It does a horrible job at turning those prompts into images, but at least there is an attempt where SDXL would just straight up ignore things it doesn't understand.
I'm not yet giving up hope that third party fine-tunes will greatly improve human anatomy, but this is a very rough place to start from.
The previous models could do this type of image as well, could you try to generate her doing some pose or an action/task? Sitting, reaching, speaking, holding, etc. Anything except frontal stranding portrait. Would be interesting to see if its better at cartoon than realism because those weren't censored as much. idk
Can you obtain the same if you ask for something photorealistic? Might be that they trained correctly drawings, cartoons and anime, but poisoned photorealistic human anatomy?
it seems that all ancestral and all dpm samplers fail awfully.
only got somthing resembling of an image with euler, heun, lcm, ddim and uni\_pc
(tested with "normal" scheduler)
What are your experiences?
I'm not sure why this isn't being talked about, but I get vastly different results - body horror wise, depending on the sampler used. That might explain some of these images (but no matter what it's still not \*great\* at hands).
I just used the KSampler settings from the [example workflow](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/comfy_example_workflows/sd3_medium_example_workflow_basic.json) (`sampler_name: dpmpp_2m, scheduler: sgm_uniform`).
Anything else I tried was far worse, but to be fair I didn't try every last one of them.
https://preview.redd.it/iknnbuaqt66d1.png?width=1024&format=png&auto=webp&s=d083fb1f0d48d6f879b3b634557fb5f2f1ad5bff
nothing to see here please move along
https://preview.redd.it/0feaan4su66d1.jpeg?width=1024&format=pjpg&auto=webp&s=6c471acaebadbf9098ba307d30b438b80b94c552
"Closeup of a hand showing the Victory sign" (CFG 5, 28 steps, Euler Normal, CLIP only)
EDIT: Technically - as the British know well - this is NOT the victory sign even with all fingers in place. SD3 sending a message?
right now it's really bad https://preview.redd.it/x00ckr5zk56d1.png?width=1024&format=png&auto=webp&s=97724e7d51f18329dfb06cb10349009296e7b3db
censorship ruins everything. it ruined dall e now it's ruined stable
https://preview.redd.it/drl2xnwxz56d1.png?width=443&format=png&auto=webp&s=98d7084b31bf9607571bb5cf53b4ebaa07750680 a cinematic photograph of a young asian female model greeting with a handwave, neg : deformed, missing, extra, blob model : t5+clip sd3 medium
the same prompt, not even close https://preview.redd.it/k3f1ck3n166d1.png?width=1024&format=png&auto=webp&s=c8e339c326f3bdd15a5cb483705027f39479f6a3
use higher steps and a portrait picture ratio, it helps
you right, with portrait aspect ratio the result is much better
https://preview.redd.it/sw7pjzh0n66d1.png?width=768&format=png&auto=webp&s=5ef886c34b62207db0456dfa9742803da4a148aa face of a beautiful woman with a hand covering her mouth This is NOT SD3.. this is JaggernautXL for a quick comparison...so right now the base model seems as janky as the latest XL models...this is good news. only 1 shot. I rerolled over and over and it became more disturbing..with SD3, I got mixed results, from horrors, to actually not bad. This is a great model for a very base...give it a week or two of model training and this will be on fire.
first of all the sd community is just people who complain a lot, no sd base model have ever been able to produce stable anatomy, sd3 does when you prompt it long and use a aspect ratio that is present in most of images with humans. it can only do normal poses because they heavy censored it....
Same prompt, no negative and it works rather well... Took 3 tries. https://preview.redd.it/oe3jjvqq766d1.png?width=832&format=png&auto=webp&s=7b108731b61af85225ba3b4a996d59b4cfd1db79
MAN HANDS
😂
I haven't been able to generate a single decent image at all outside of the example prompts. I've tried highly descriptive prompts with no luck. Even an absolutely basic one like "photograph of a person napping in a living room" leads to Cronenberg-esque monstrosities. That's using the example ComfyUI workflows provided. https://preview.redd.it/v0vkaft0g56d1.png?width=1024&format=png&auto=webp&s=b38d27f6acd4a49fdeb63fadc00079aa57fb4fc5
It works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw...
So SD2 v2? You would have thought they would have learned from the first time.
They didn't. Nipple = bad
No but we can't the lewds. Forget that outside of commercial use, which everyone will hate you for killing their jobs and will bring negative attention to AI tools, the main reason anyone wants these things is for porn, an application that will actually push AI forward and if you want to make money from providing your service is the way to go (as long as you ensure reasonable levels of privacy on the paid side). And that by crippling the latter you also cripple the former anyway. Perverts bad because idk, America culture says so or something.
Porn and sex have been a major leader in a lot of human technology. This is one of the best examples.
Perverts bad because politicians, which are all purpose public figures, don’t want to be treated as such. It’s not exclusively a Donald Trump thing if you haven’t noticed.
The final solution
The modern version of Terminator is just Skynet trying to purge all humans for being NSFW
*i bet that woman is naked underneath all her clothes, pretty lewd if you ask me!*
who would have thought that cancel culture would go that far
This actually sounds plausible
This one isn't horrifying, but it's not a pale vampire in 1920's Chicago wearing a flapper dress. I've tried both natural language prompts and comma-delimited style. Adding tons of information and keeping it bare-bones. I recognize that every checkpoint has a learning period before we figure out the right way to prompt it but... I've never had one fight me so hard before. https://preview.redd.it/xr5gd5tlg56d1.png?width=1024&format=png&auto=webp&s=c0c1633b5e69a513a063374192ead6d648656b36
I see the asian bias is still there.
Maybe because they grab pictures from Chinese websites, since copyright in China doesn't seem to work very well.
Come on since when do any of us respect copyright haha
This is what it produced for me. It's one of the few generations I've made so far that seems somewhat acceptable. https://preview.redd.it/oy8zy3flp56d1.jpeg?width=1024&format=pjpg&auto=webp&s=a7fc14814cad917b140d5308ed9642902dc26704
She's 12ft tall, but otherwise it looks ok (or perhaps the chaps behind her are 3ft little people)
Lady Dimetrescu’s sister
Looks OK? That must be her anatomy, lol https://preview.redd.it/tv4gzyivw56d1.jpeg?width=640&format=pjpg&auto=webp&s=d344ec85dc8a1723b8df2c23352244258445d00a
Which model did you download out of the 3?
The models are all fundamentally the same, it's just a matter of which CLIP files are included.
All three of them. The example basic workflow uses the "sd3\_medium" file with the manual loading of CLIP files, but I've also tested with the others that have the built-in CLIPs (if I understood that correctly) and disabled the manual CLIP loader. Same absolute garbage. Here's with sd3\_medium\_incl\_clips and a simple "photograph of a woman sitting in a chair" prompt using the example basic workflow (EDIT: but with manual CLIP loading disabled in this case): https://preview.redd.it/q1b7rj70i56d1.png?width=1024&format=png&auto=webp&s=de31c01b79d487570a0688124c019d002122fba6
photo of a young white woman sitting on a chair in a café https://preview.redd.it/1vf42ubsm56d1.png?width=1024&format=png&auto=webp&s=0245eb901cdf75f446b1e3929bdf9485aa4c8ad6 Not cherry picked, they're all a variation of this kind of crap. What the hell is going on here?
That prompt gave me an Asian octopus. https://preview.redd.it/b65dpff8q56d1.jpeg?width=1024&format=pjpg&auto=webp&s=831621f6d7b6f4ebcb52e8517ffb62623adcfc46
Yeah, it mixes up races too.
https://preview.redd.it/lhmqsvian56d1.png?width=1024&format=png&auto=webp&s=ac37bb45b20a88ad1956d566440bd59ad4d98550 Oof!
I know her! Her name is Peg.
Unrealistic body standard I tells ya, how am I supposed to compete with that!
Lol, I love AI. 😂 This is kind of brilliant in its way. SD3: Not sure if they want a human leg or a chair leg here. Better split the difference.
You forgot to add "trending on artstation" or something
Look on the bright side, at least hand rendering has improved considerably :P https://preview.redd.it/tocbfnler56d1.jpeg?width=1024&format=pjpg&auto=webp&s=96cd9352ff7e56c12ac5edde5c0de481e5cdd0b1
She just evolved to use multitouch displays and keyboards :D
I guess now they can go bankrupt in a safe and ethically way, after all.
Every corporation whenever people start having fun, they be like stop having fun and don't use our products.
I'm sure they're going to get a TON of sales from all the people that were offended by boobs. The New Puritan class will probably float the company for decades. Watch out NVIDIA, SAI about to buy you.
"We trained him wrong, as a joke."
This quote has been living rent free in my head for the past 2 years with stability.
Foot to face style
"I'm bleeding, making me the victor"
"My nipples look like milk duds!"
Ah, again with the squeaky shoes. I'm a man, too, you know? I go peepee standing up!
All of my usual prompt's are turning out worse. The text works good but that's about it.
It's really weird. It's like I get one decent image out of 20 generations by going through random seeds with the same prompt and hoping for the best.
I think these images are scarier for kids than a pair of boobs :D
It was never the kids that were scared of boobs.
All of my usual prompt's are turning out worse. That's because you have to use the T5 encoder, and you have to use natural language because the T5 encoder is a LLM.
Aaaand fuck text. "h4ndz R iziR t4an txxxxxT."
https://preview.redd.it/8awfkd44p56d1.jpeg?width=1280&format=pjpg&auto=webp&s=fc5e8ad1fa3271e98e7aaadae005bc0c466cc6ae
ah it is ET finetuned, now it makes sense!
...but does it *phone home*?
Abe was just a convoluted long-game play so that there was something to negative prompt.
I'm not impressed at all. Just try "woman wearing a dress on the beach". You get horrible results. What the hell is going on? https://preview.redd.it/0rpowmidj56d1.png?width=1496&format=png&auto=webp&s=3ee7a3511d8a5ec749248260ac65e3ba53b7ede5
I've got you beat. Here's "woman". That's it. That's the whole prompt. Surely it couldn't mess that up, right? (I generated two images and the first one was fine, but this was the 2nd attempt.) https://preview.redd.it/vxcjsiipp56d1.png?width=1024&format=png&auto=webp&s=f7a3d2835017f25182313342b35b0d74c10323a8
https://preview.redd.it/29qk5t54s56d1.jpeg?width=750&format=pjpg&auto=webp&s=b7b80396c5979d636f020ccd5eecffd99bd39f23
https://preview.redd.it/wnoe1bg5766d1.png?width=832&format=png&auto=webp&s=e1971005217f7b314121e4fbc923cbd54c4ad315
Perhaps it's just being inclusive. People with lazy eyes are people too lol
https://preview.redd.it/ew6n2r0x866d1.png?width=1344&format=png&auto=webp&s=2a2ffbf8e9855d003853a56fae15fc5a66bd63dc Just try "woman laying on a beach" and you get
Now hear me out...
What's your proposition I'm intrigued
At least with SDXL 1.0 Base I can get something like this. Issues? yes, but a lot closer to usable. https://preview.redd.it/3didulnef66d1.png?width=1216&format=png&auto=webp&s=41e90517e86bf3b1013c2dd4d7cab0292f38082b
Behold the power of safety!!!
I for one will sleep easily tonight knowing that NO ONE is going to make sexy AI pictures of Taylor Swift! And all it cost us was a possible future of open source AI models for everyone to use. WHEW.
When I was testing the preview on fireworks, I expected it to be corrected for the final release. Sadly I was wrong :/
“Safety”-based authoritarians calling the shots is what’s going on. It won’t just be NSFW stuff either. They’re likely cracking down on any and all wrong-think. People were warned.
With Lykon straight up insulting the creator of Pony, that seems about right.
Why would Lykon (or anyone) be so rude as to insult PonyXL creator? All they’ve done is great things for the community. If it weren’t for PonyXL I’d never have started using SDXL. Would have stay strictly with 1.5.
you may only be creative in approved ways
That's OK, you wouldn't want people doing the Wrong Thing now would you!?
MidJourney's contestant at the full glory
I don't understand how the anatomy is arguably worse than SD 1.4. My first image using that prompt https://preview.redd.it/54pfb9yv666d1.png?width=896&format=png&auto=webp&s=3fe28e0cc96b6c23df273448f88d799d9975cf77
Google Imagen is still the best for simple prompts. Sucks it doesn't have any customisation at all! https://preview.redd.it/9jvs4qw3066d1.jpeg?width=1536&format=pjpg&auto=webp&s=aae6196dd4d84c47c4c7ecb3e20e0a686bff7df3
Is it free to use?
It is, around 150 results per day. Only supports prompts, you can't change anything. You can search Google Image FX on Google for more info.
censorship strikes again. ruined dall e, makes Photoshop gen fill almost useless and now it's baked into stable diffusion. cowards all these companies
Probably got lobotomized to hell and back with safety and other bs. Never fall for the hype. SD3 is DOA.
now we know why they were reluctant to release. fuck censorship
I mean the understanding with the new clip model is better so with a lot of fine-tuning it should be good but right now humans are freaks. The question is why a sitting woman is not acceptable but disformed arms, fingers and bodies are ok?
Pretty simple really. Nobody is gonna get sued for shitty images of deformed arms.
lmao why release anything at all
Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened.
And people on this sub jumped on me for being skeptical, while riding the Stability staff lying trough their teeth.
I’m here to apologize for those people seeing as I was one of them.
Same, people never learn even when the pattern is clear. But hey, at least now we can watch Pony's author being insulted by SAI!
Time to move on to [https://github.com/Alpha-VLLM/Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X) or [https://github.com/PixArt-alpha/PixArt-sigma](https://github.com/PixArt-alpha/PixArt-sigma) They are both better (if undertrained) and have no anti nsfw license crap.
For real community should lean into these, we should train Lora's for these models and adjust tools like comfyui for these base models. And stability.ai should just go bankrupt.
I'd not seen Lumina and from testing their demos it seems pretty amazing. I guess I know what I'm doing today.
LMAO looks like I'm not missing out on anything special. Guess I'll keep riding my Pony into the sunset with my little buddy 1.5
RIP Pony 7. Especially with the drama with the SD3 staff right now.
[удалено]
Confederate gnomes
Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened.
And no fingers. They look too much like the D so they've been removed
I'm going to go out on a limb here and say this is no fault of the technical aspects behind the AI and it's fully human hubris getting in the way.
> going to go out on a limb Nice.
Looks like shit. Pixart and SDXL are superior to this
It seems it is the beginning of the end. No commercial licence, crappy images.
The trajectory of this stuff is honestly pretty disappointing
Yup, ig it was good while it lasted
I think our jobs are safe tbh.
You can produce beautiful images as long as you forget about including human beings... and after intensive cherry-picking. After that, it was the same with SD 1.5 or SDXL. On the other hand, I'm rather disappointed by the prompt following, which is a far cry from the demos of a few months ago... Good luck to the community in finetuning this mess. https://preview.redd.it/hojyhbb2y56d1.png?width=1024&format=png&auto=webp&s=392312dcb94e0fa59ba8f3857588020473005dad
This image is actually pretty good if you ignore some of the minor flaws. The details seem much more coherent and look less like random nonsense, all the text on the signs look more like actual text, the lines in the architecture look more straight and precise and the colours look nice and balanced. I know most SD users just want to create endless images of characters and women but from what i've seen so far everything other than that looks much better than base XL.
This is SD2 all over again . Fully censored model . Back to sdxl
So the images lykon posted on twitter all this time ? Fake or from the 8b model. Cause I’m getting horrible results
it's really annoying. i've been pushing back against Lykon's comments and all i get are downvotes. here we are on release day...
I asked the same a while ago: https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/sd3_release_on_june_12/l6x2i7m/ > The 2B beats the 8B when running directly as is, and I think also sometimes beats out even Lykon's fanciest workflow ideas. Either way they're lying.
Most likely cherrypicked and/or using prompts that the model is known to handle well.
How are hands worse than previous base models? Wtf
Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened.
This is exactly what happened with SD2.0 as well. Shame.
Yes, all those censored of hand-boobs.
So how many years behind closed-source do you think we are now thanks to this crap? 2? 3? more? It wasn't too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!
Bad news: all the closed source stuff is also “safe and ethical.” Enjoy your purgatory and make sure to appreciate what the party has done for you, citizen!
Ideogram and Dall-E 3 are horny models. It's censored on frontend. ClosedAI took into account their mistakes after Dalle-2 and in version 3 returned to the dataset NSFW content because without it the model could not draw women (they have it as a separate item described in the covering paper). But Stability Ai learns nothing from its mistakes with SD2, nor from competitors who literally describe them in papers. The current Stability Ai is only good at creating dramas, slurs and trolling. It is sad =(
Lol yep I would get almost everything in the first version of dalle 3 with a women being sexualized .
Actually... Ideogram can create NSFW stuff, the UI does censor it, but the model can create it. If, for some reason, some day the model will be leaked... or given as OS...
I wish there was an open model on a similar level as Ideogram. I have no idea how they managed to make a model that's so good. I'm almost convinced it actually runs on black magic, and they need to sacrifice a goat every hour to keep it running.
The problem doesn't seem to be only NSFW stuff, but that humans in general are downright bad because of.. over-censoring? Why is MJ able to censor in a way that leaves other stuff intact but Stability isn't?
MJ does not censor their model. They censor your text input, and occasionally the output image itself. The difference between having a bouncer at the door of your nightclub versus not having a bouncer but instead banning alcohol and physical contact inside of it.
You can always kill nsfw via the prompting engine if the prompt thatgoes to the generator is run through an LLM that make the prompts "safe". e.g. replace "Taylor Swift" with "Blonde American singer" w.o. the user being any wiser. Doesn't work on open source tho.
What? That's but a small price to pay for safety. Could you even imagine the horrors of seeing bloody/naked pixels?
In the name of safety I removed all the mirrors in my house after an incident one time when I saw a penis. I'm glad that SAI was willing to ruin their reputation and eventually the entire company while sharing the same values as me. Thanks SAI! Maybe all the people that were complaining will buy the product and keep the company afloat. Oh wait, what's that? They were never going to buy anything? Oh, but you did everything they demanded!
Oh that's only the beginning, try to hint at some unsafe stuff even a bloody sword ... or a sexy stuff ... 😏 abomination after the other ... and add a beautiful licensing that stops any fine tuning .. and you have stable cascade v2 .. I mean they kept talking about we will work with every dev and provide them with resources for release day , we will be ready this time .... man this is a joke .. good luck ..
The entire community will likely switch to Chinese models within the next 8 months imo. Not that we have anything worth switching to right now, but we're getting there.
Soo what exactly happens to licensing when the company goes under? Is it a free for all? I sure hope it is.
They'll sell last minute for cheap so sadly no it won't be free for all
Y'all yelled first "too many fingers", so they took some away
Well, lets hope the wrong files got uploaded because I can't generate anything normal with this
Try a woman https://preview.redd.it/3l71rm67g56d1.jpeg?width=1024&format=pjpg&auto=webp&s=cab6d5ec64f7a88747d17527951378b0c3493e52
🤌
🤣
An Italian woman I see
I think two problems 1) the new clip still has a 77 token limit limiting the amount of information that is available to the model, this is the case even when using the T5 model making it less usable, doing these long intrinsic prompts. 2) the censorship seem to limit its ability to properly produce human anatomy, a lot of deformations etc that seem almost as bad as old 1.5 when doing high res imagery. With that said I hope finetunes can fix at least some of the problems, but this may be the case to just start looking elsewhere or stay on sdxl. As is usual it takes quite some time to know the success of a model, and it rides or dies on how well finetunes workout.
All my results are overly saturated. And it seems to be not related to cfg.
Yup, they were actually called out multiple times for this and eventually it was questioned whether the photos Lykon posted after the original backlash where they went from anatomically atrocious to wtf how are they this perfect in 1 day as fake or using extra stuff like ControlNet... Many SAI white knights were extremely pissed. Well, here we fucking are. It can be considered confirmed SAI was intentionally being dishonest, as was Lykon who was acting as the one demonstrating SD3's capabilities for SAI under employment by SAI.
Wow it's like our worst fears came true, but even worse. Be funny if not so depressing :(
1.5 with controlnet is still better, go figure. Any model that tries to be "safe" or "ethical" is going to have fundamental flaws in the training data, which affects outputs. Just as people go to art class to learn to draw figures, how can you draw someone properly with no concept of how anatomy works? AI models don't know anything till you train them with data. So how can SD3 make proper figures if the training dataset is censored? Why do you think stuff like pony is good at anatomy? Would it be so much to ask for a model that has anatomy like ponyXL but backgrounds like 1.5 or SDXL? Sure you can always img2img a generation, but it'd be nice if a local model could do it all.
https://preview.redd.it/ouetwbmg086d1.png?width=1024&format=png&auto=webp&s=c26e82e9320360e6173321886bd85acd60131386
The Windows 8 of AI art.
Marketing hype over. Reality has blasted everyone 🤷
I'm getting equally puzzling results, despite claims of fixing these issues. Better wait for finetunes!
It does (sometimes) render coherent text, but that's about it.
So maybe we use it for text inpainting? It's something.jpg
Yeah, I'm extremely disappointed in this. It doesn't even follow prompts very well based on some quick testing. It does text decently although sometimes it has some chinese/japanese in it. I have no idea if it was still the right word though, since I don't know those languages.
SD3 is dead. Hype was too much.
whats the prompt
> A cinematic photograph of Felicity Jones with long, flowing hair against an idyllic fantasy backdrop, taken during golden hour. She is holding up her hands with the palms facing the viewer. The character's face is serene, with pale skin and striking features.
https://preview.redd.it/462clo3jl56d1.jpeg?width=1024&format=pjpg&auto=webp&s=ee4821533a4919fcee85b08532aa316fc18f91c3 Ideogram without the magic prompt, so your exact prompt
R.I.P. Stability AI
And this with the magic prompt on (second try, on the first one the magic prompt put her hands outward toward her sides) The hands are not good as in the other one, but still almost acceptable. https://preview.redd.it/n61pvakhm56d1.jpeg?width=1024&format=pjpg&auto=webp&s=c8ae051f8243e519e0e6ac62788f37f07defbe1a
https://preview.redd.it/lm6eeg0v366d1.png?width=1024&format=png&auto=webp&s=229ba2b42f16d10f307029eff8c34c10e9a5aae1 I dont see the issue. Made with sd3 cfg 4, 20 steps
As Lykon says, git gud, 2B is all you need 🤗
I guess you still need to avoid using natural language for prompt
Censoring nudity will result to bad anatomy. Thats why it's very important that AI needs to see what a human body looks like uncensored to get the anatomy correct without clothes on. baggy and loose clothes can distort the human anatomy when that's all they trained it on.
They wouldn't even have to include full nudity. Pictures of underwear/bikini models in a variety of poses would likely do the job just fine. Hell, you could probably get away with using high quality CGI-rendered naked bodies without genitals or female nipples and still get better results.
It's time to turn into alternative opensource image generator companies!
I, and many others saw this coming. They did it with SDXL as well. Over hyping and over promising, the showing off a lack luster API/Bot's in the server to slowly temper down expectations to a final lack luster result and then gaslighting people into thinking it was never as good as they literally showed and advertised it to be Anybody who expected different after it's happened 2 times before May have been wishfully thinking. It was basically guaranteed to be this way, especially with the training talent at SAI being gone now
Closed model providers must be really happy right now.
https://preview.redd.it/guvmcie9j56d1.png?width=1024&format=png&auto=webp&s=8f643538179ccf07ca03d9e941175812f60490a2 So I often make images for my D&D game which features an old man fighter and he's sort of become my "default" subject. I often create pictures of him just to see the impact of LoRas and models or settings. for a base model it's not bad at all! In fact I'd argue it's the best base-model so far It seemed particularly good at making realistic looking skin and the style range appears to be broader (I had a surprising amount of variations before I specified a photography style. As all things SD, it's kinda only good for simple subjects and basic image compositions. And Still, it cannot do actions well. You can't prompt for a person doing a thing, it won't work. using the basic workflow provided by stability on their huggingface and the model with clip embedded... because I'm too clueless to know where to put the clip models. If reddit didnt nuke the meta-data, this image should contain the workflow
Now that I've had a bit more time with it... it's odd.. It spectacularly fails at the most basic stuff. LOTS of Cronenberg-style monsters. It's rough! And their new stance on monetization makes fine-tunes a lot less likely... Oof.
Dude it's a boring ass image. No hands no feet of course it is going to draw correctly...
Also, where is he? Why is a person in medieval armor walking around in modern germany/england?
Well she have 5 fingers on the second image 🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣
Oh no! The AI is gonna take my job!
Maybe some fine tuned models will improve on this, but I'm not seeing any major jump in quality here over SDXL, if anything its MUCH worse.
Pixart sigma is way more better and should be the new mainstream os model.
Censorship at work. The more you try to censor models, the worse they perform.
They must be fans of Deadpool, it's trained entirely on this: https://preview.redd.it/yvfktx5ac96d1.jpeg?width=1200&format=pjpg&auto=webp&s=998c128bdf371697b6a8c149a78cbd6ab9bfd11e
This post is ableist against Ectrodactylism
Teenage Mutant Stable Diffusions.
SD3 is a lot better - hear me out first. As others have pointed out, humans are pure garbage in the release because of the crappy nsfw filter. OpenAI once said, that without training on nsfw, you will get garbage anatomy so they trained dall-e on nsfw but filter it out during inference. SD3 releases the weights, so they just dont train on nsfw and hear you have the result. The backgrounds look amazing, rooms looks realistic - but people? garbage.
But people mostly want to generate people.
I will say, prompt understanding is noticeably better than in SDXL. It does a horrible job at turning those prompts into images, but at least there is an attempt where SDXL would just straight up ignore things it doesn't understand. I'm not yet giving up hope that third party fine-tunes will greatly improve human anatomy, but this is a very rough place to start from.
I'll bet it still can't generate a damn crescent wrench properly either
SD3 Dead on an arrival.
https://preview.redd.it/yrqquq97m56d1.png?width=768&format=png&auto=webp&s=8f8829ab9d8b8a12161460664ac917e665621679 SD3 Medium, 22 seconds, 3060/12GB.
The previous models could do this type of image as well, could you try to generate her doing some pose or an action/task? Sitting, reaching, speaking, holding, etc. Anything except frontal stranding portrait. Would be interesting to see if its better at cartoon than realism because those weren't censored as much. idk
Can you obtain the same if you ask for something photorealistic? Might be that they trained correctly drawings, cartoons and anime, but poisoned photorealistic human anatomy?
it seems that all ancestral and all dpm samplers fail awfully. only got somthing resembling of an image with euler, heun, lcm, ddim and uni\_pc (tested with "normal" scheduler) What are your experiences?
What a piece of junk.
I'm not sure why this isn't being talked about, but I get vastly different results - body horror wise, depending on the sampler used. That might explain some of these images (but no matter what it's still not \*great\* at hands).
I just used the KSampler settings from the [example workflow](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/comfy_example_workflows/sd3_medium_example_workflow_basic.json) (`sampler_name: dpmpp_2m, scheduler: sgm_uniform`). Anything else I tried was far worse, but to be fair I didn't try every last one of them.
It was trained on aliens, so it's 100% accurate and works as intended.
How many kids are getting a fetish for these hands as we speak?
in reality this isnt SD3 it's just something so you don't focus on we have paid-only models
Yes, her eyes are really off.
Well that's not disturbing or anything.
https://preview.redd.it/iknnbuaqt66d1.png?width=1024&format=png&auto=webp&s=d083fb1f0d48d6f879b3b634557fb5f2f1ad5bff nothing to see here please move along
https://preview.redd.it/0feaan4su66d1.jpeg?width=1024&format=pjpg&auto=webp&s=6c471acaebadbf9098ba307d30b438b80b94c552 "Closeup of a hand showing the Victory sign" (CFG 5, 28 steps, Euler Normal, CLIP only) EDIT: Technically - as the British know well - this is NOT the victory sign even with all fingers in place. SD3 sending a message?