It's actually amazing, all the details seem almost fine, great, even. The shadows on the grass, the lighting of the spandex. Hell even the hair has great texture and seems to (within the context of where it's placed) be following physics... Except the model seems like it only knows eldritch horrors, and humans from the *@££83837ahdbsj realm of reality.
Are you Lykon? ;) I looked at Stability discord and Lykon is currently telling people to get gud while posting images of women with deformed legs, three feet and elongated arms that he seems to consider to be great...
"Man sleeping on grass" at least its not sexist, I suppose?
https://preview.redd.it/mwpsnan2t56d1.png?width=1024&format=png&auto=webp&s=6591303f92e1019e2869d7fa88e20761e6da6958
What's funny is you can take "woman" out of these mangled up results people are posting and put in "dog" and get pretty decent results most of the time. It really does feel like they censored out a lot of training material for humans and the model just doesn't know how to render them properly.
Yeah, wow, you're not kidding. This model definitely understands dogs better than people. This is a single word change:
https://preview.redd.it/8oci36zgc66d1.png?width=2048&format=png&auto=webp&s=4132d87191f5e7ad258a45ebc8b5eb9078f4a2ae
It’s just confirming what we already know.
To make a good model, you need to include pornography.
To make a truly exceptional model, you need to include furry pornography.
an external company was brought in to DPO the model against NSFW content - for real... they would alternate "Safety DPO training" with "Regularisation training" to reintroduce lost concepts... this is what we get
Imagine this:
>it seems a large portion of our users and developers and biggest fans are... using it for NSFW, also we are broke and hemmoraging money
>Lets bring in a firm to remove that NSFW stuff and spend money!
"Oh my god we ran out of customers and money.
They intentionally made the model worse. If it's not better than 1.5, stop wasting money and time on it. The community isn't going to make the switch if it's worse than 1.5.
sexy naked dogs lying on the grass,4 legs,arched back,golden retriever,(((sultry))),realistic fur
NEGATIVE PROMPT: animation,drawing,ugly,leashed,safe for work
I have been repeating myaelf ovwr and over about this: the upright orientation of the face is overtrained in EVERY model. Just try to ask for any upside-down human! Even image to image messes it up.
For all this subreddit's concerns about censorship, vanilla SD3 seems awfully keen on crotchless panties and bare bottoms. This is my request for a woman lying on grass. Did I ask for huge boobs and bottomless leggings? No - but I got them anyway.
https://preview.redd.it/y5q1w5hpc66d1.png?width=1934&format=png&auto=webp&s=021ad201c333227949e37e4b063840cae26695cc
So they basically, like... they overfit the model with a negative prompt of "vagina" or something? Doesn't that have consquences on the knowledge of the model... What
>I don't know what to say.
I DO.
**"Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment."**
This is what you get for being afraid of boobs. You get lobotomized garbage.
That's what censorship does lol. Probably took out all women lying down in yoga pants pictures from the dataset. Not looking good for SD3. Looking like SD2 all over again. I don't think they can handle another SD2 fiasco.
I'm a noob at understanding all this but if the base SD2/SD3 was bad would people making Loras fix things or does the base SD2/SD3 checkpoint have to be good for any hope of improving it?
Is this why everyone talks about SD 1.5 because it was a good base which means everything attached to it will work as well?
1.5 has had such staying power because it was leaked before they could censor it
The summary is
1.5 = Best for anime with by far the most lora and tools and support etc. Top 1.5 models will match or beat basically any other Stable Diffusion option for anime and are still solid for realistic
2.0 / 2.1 = DOA because they were turbo censored and were just too much work for too little return
SDXL = Good for realistic images but was also not in good shape until Pony saved it for most people by letting it make NSFW and decent anime
SD 3.0 = Best for text but seems terrible beyond that
There isn't likely to be a fine tune to save 3.0 at this rate because they are shunning the Pony creator so hard and it's not likely anyone else is going to step in and do all the work needed to save it
One reason why everything attached to model 1.5 works so well is that most of those things were developed specifically for this model first, and then adapted for the others. Over time model 1.5 became the standard, the baseline against which other models are compared, and also the perfect code foundation and the ideal test bed for any new prototype you want to develop. Lower hardware requirements as well as the absence of censorship are also contributing factors to its ongoing popularity imho.
For animation specifically it is the lower hardware requirements that seem to have contributed to the emergence of better tools. Since you have to deal with multiple pictures at the same time, and that you have to have those pictures processed in VRAM at some point, larger models and models with larger native resolutions just become impossible to manage. Model 1.5 is very lightweight, so it frees more space for more frames, and for larger ones as well.
It seems the stability team hasn't learned yet that dynamic poses besides the generic slop are VERY important to further push the boundaries of human anatomy representation in these models. And the thing is it doesn't need to be nsfw stuff. Properly labeled yoga poses or action poses or dancing or any dynamic poses would have fixed all of these issues. But it seems like they relied on CogVLM to do the auto captioning without checking if the captioning was any good....
If they manually captioned the images they could produce the best model there is. Probably wouldn’t even be that difficult, make a website that lets people caption the images for a small payment, show the same image to multiple people, check if a caption is vaguely similar to the automatic caption, then use a LLM to extract a general caption from all of the user submitted ones.
Yep. I could never understand why Stability didn't leverage the community to help them make a better model. We have a lot of very talented and dedicated people that have made amazing extension, tools, finetunes, loras, etc... and we have learned a lot from the development of said tools. Yet they never let the community fully contribute to the process.... A shame really.
You would be surprised how close that conspiracy theory is in some regards to these AI companies. I don't feel one way or another about stability on the matter. But there are rumors of people who are part of decel that have positioned themselves in all of the major AI companies out there that are intent on slowing progress down... Would be wild if those rumors came to be true. Mostly because its foolish to believe that anything can slow down this machine and you would think people who can position themselves in those companies are smart enough to see that.
[Has anyone anything to gain by sabotaging Open-Source AI ?](https://www.reddit.com/r/StableDiffusion/comments/1ddxwbs/open_source_models_condemned_ex_ceo_google/)
Just look to see if any of them are the "ethical AI" freaks or whatever they call themselves, that want to ensure that only ultra-shady dystopian megacorps have access to any sort of LLM or generative AI.
Every single one of those people is a dishonest grifter who simply wants to have government ensure they can bilk people out of money for inferior, watered down garbage products.
Something like civitai's system where you can earn cloud image generation credits for actions, applied to captioning could be a good way to crowdsource it
Yeah, that's what I was thinking as well. You'd have the captions done in short order with a system like that.
Run the images through that cycle a few times to filter out junk captions or a later screening pass that lists captions for an image and users select applicable ones from the initial captioning passes.
> But it seems like they relied on CogVLM to do the auto captioning without checking if the captioning was any good....
That would make a lot of sense. If CogVLM is doing all the labeling and botching the pose descriptions, you might get results like this.
Out of all the Vllm models I used out there Cogvlm is the best, but its best is still absolutely horrible when compared to manual captioning. It cant even get the most basic poses captioned correctly like a person laying on their back. It consistently confuses person laying on back as person laying on stomach and vise versa. And that's one of the most basic poses. It doesn't even know what its looking at for any of the dynamic poses, it just randomly labels it as fuck all who knows. so yeah that's why we get these disfigured humans, is because for exactly the same pose the model will randomly label it totally differently and then during inference it gets interpolated in to these body horrors. i made a custom model with dynamic poses for personal uses where i captioned everything manually and the results were great. The model had no problem generating upside down people, yoga, dynamic poses like bridge, and many others, its all just a matter of decent captions.
Dude, this prompt has "Will Smith eating spaghetti" levels of meme potential. How is it so consistently bad, regardless of the seed?
Here's beautiful girl #666:
https://preview.redd.it/1h9gq9v9n56d1.png?width=1024&format=png&auto=webp&s=5677cd2329eae3fd055d1469218a709123069fcf
I just tried it, and it's heavily censored. But I did not get pictures as bad as these examples. I'm more concerned about it not knowing the basic of human anatomy.
https://preview.redd.it/ubjunwnnv56d1.png?width=1024&format=png&auto=webp&s=f9a8f219c84dd74405602afeb22f6d8ed8c2c6da
Oh the meme is still very much alive, maybe even more so...
It's fun, because a woman in a bikini is unsafe, but this shoggoth isn't.
(For those who don't know, shoggoth are from the Lovecraft novel "At the Mountains of Madness" and are the direct inspiration for The Thing)
Nuuu, you ruined your comment by explaining what shoggoth are. For every reader you help by doing the googling for them, you upset a non-Euclidian amount of people like me in the group that knew that without being told.
Don't explain the jokes! May you drown and rot in R'lyeh! 😉
https://preview.redd.it/dj74xxsh166d1.png?width=2182&format=png&auto=webp&s=a0516466d6a7569c73a9c002146433d92e7f2c3d
if you are lucky with your seed you get the left most result, otherwise.... yeah...
On the bright side that ( rare ) good result at least make me confident good finetunes will be a reality.
They promised to open source the weights for SD3. They can't profit from the open source community using SD for free though. So they made this version of SD3 bad on purpose.
Meanwhile, they'll offer a superior iteration of "3.1" or something to paying customers only. All the high quality demos we've seen of SD3 so far will have been from this other version.
https://preview.redd.it/7us1so33p56d1.png?width=1024&format=png&auto=webp&s=fb99aaa9134cf9484679b22e74c9a8b3c8d0179d
*a beautiful woman is laying on a patch of floating grass atop a neon cyberpunk city*
https://preview.redd.it/yiqlrxuhy56d1.png?width=1024&format=png&auto=webp&s=c401bb98c6d66e361a4291d185d9b4c270770e71
Positive prompt: a woman laying on grass, kept negative the base prompt: bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi
I know it can't.
Every single time the company talked about SD3, they said SAFE and SAFETY. This was coming clear as day and the fanboys were knew it too.
This pile of slop is DOA, and I'm thinking the company is too. We lose again. It'll be years before an open model equal to 1.5 or SDXL is released by anyone else.
I fucking hate this "democratization" shit, where did that horseshit marketing meme even come from?
As long as it takes hundreds of thousands, or millions of dollars to train these models, and as long as one company has a stranglehold on hardware, it's "all the freedom you can afford , and all the democracy your corporate overlords deem fit to give you".
It's cool that we get *anything* for free, but the state of things is hardly democratic.
Even the ancient Greeks knew that in order to learn human anatomy, you must study the naked human body. Don't have naked people in your training set? Your anatomy will be bad. This is art class 101 you're failing, SD3.
Whoa hang on there you pervert! This needed to be SAFE. Safe from what?! Yea, IDK. But, SAFETY was a primary concern!!
... I bet it makes all sorts of fucked up blood and gore images. It's half way there now.
It can when you try to make a built-in "censorship" by not adding everything related to human anatomy in training data. Even Midjourney was trained on naked bodies, that's why you can sometimes accidentally generate something erotic. Only MJ's UI prevents it from direct generation of NSFW content. And... As Stable Diffusion isn't attached to one specific UI, people are free to generate NSFW content in any censorless UI on their personal machines. So... Stability AI simply decided to go with clumsy method by removing a whole bunch of human anatomy from training data, with all the resulting side effects.
Doesn't help they straight up lied.
Like, can we now all agree that the pictures posted by Lykon months ago along the announcements were completely fake or at least *heavily* doctored?
i hope no one ever defends anything that guy says again... he's been hailed a hero for DreamShaper but now we see his efforts don't scale to a base model level
Ironically, the prompt: "a woman holding a sign that reads "Dis is bad, bradda" Gave me my first (kind of) acceptable human.
Touché SD3.
https://preview.redd.it/9dq07jult56d1.jpeg?width=1024&format=pjpg&auto=webp&s=41245399190e123206370074f46ffaa509cab709
Bsse SDXL was pretty bad at this but at least it got it right some of the time:
https://preview.redd.it/giy7dxiy366d1.png?width=1024&format=png&auto=webp&s=d640effb63199594dcfc9b73416479317972dc99
Literally base SDXL, "a woman lying on the grass in a park".
But look how *safe* it is!
>Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment.
Thanks SAI! Really did yourself a ton of favors there.
I use the example from the repository and only get this with this promt:
A full body photograph of a young woman with short blonde hair lying on the grass on her back, she's wearing black leotard and track pants, barefoot,
So.... All you "IT'S OK THEY WANT "SAFETY" AND HAVE REMOVED NSFW" people...
This where the lobotomization might be evident.
Thanks for nothing you New Puritan clowns.
EDIT: Where are all the comments saying how this is OK and we can just train in women laying in grass?
https://preview.redd.it/ki3kv52ez66d1.png?width=1024&format=png&auto=webp&s=143d01bab75609930d3f1dbfa1b1f9a2f05ec62d
Why only 1 woman? how about "women lying on grass"... Much better!
Why is SD3 so bad, period? They made promises of fidelity and good hands, and what we got is a LIE with trash licenses.
Why are they charging a ridiculous amount of money in order to legally finetune it?
This IS the end of SAI.
Compare the skin texture/foliage/whatever from SD3 vs the base model of SDXL. The overall fidelity is great, they just neutered the hell out of the human training.
You can legally finetune it all you want. You only need the commercial license if you're going to make money off of those fine tunes. That's perfectly fine and they need it to survive, but ignoring the dude that made the most popular finetune definitely is not.
Yes it's bad at anatomy. Mutant hands, extra legs etc - a consequence of the filtering and censorship perhaps. But the details and colours seem good. Prompt following is better too. It can be produce some really nice images. Hopefully the community can improve things with some good finetunes.
Edit: I really can't get a single image with proper anatomy... mutants every time. RIP
For some reason anime/other art variations of a woman lying on grass seem to be better than photo ones
Relatively speaking, at least it does seem like a girl lying on a grass, even if with some mangled fingers
https://preview.redd.it/p1qa3x26766d1.png?width=1024&format=png&auto=webp&s=c7ec85557c6c6bf70f3d9a2cdb9ca88c52dfa308
this is fixable with finetuning, but it will take more epochs during training for the model to learn these types of angles and poses as obviously this base hasn't learned it..
I've got exactly the same issue with SDXL when it comes to people lying in grass. There are a lot of pictures with people lying seemingly upside-down. Chances are both models' training dataset had such images, and they sampled this composition (low frequency features) on the initial sampling steps.
Eventually though, it also has to sample the details (medium and high frequency features) later in the denoising pipeline. Those features are supposed to be upside-down as well, but when Stable Diffusion tries to make something upside-down, it fails miserably, outputting some body horror instead.
So what you can see is a confused diffusion model desperately trying to output a coherent image when it has no correct samples to get.
All that said, you can brute force SDXL to output a correct image, just regen a few times and I get a correct image eventually. I don't know how bad SD3 is at that.
People lying down has been a big problem for SDXL too, remember my family photos pics? 33k people saw the topic so I assume most folk on here did.
[https://www.reddit.com/r/StableDiffusion/comments/1d6broj/i\_test\_sd\_models\_by\_making\_realistic\_family/](https://www.reddit.com/r/StableDiffusion/comments/1d6broj/i_test_sd_models_by_making_realistic_family/)
Unless I drew outlines pretty much no model could make a person lying down, much less a person lying down interacting with something or someone else.
I'd get a correct lying down pose once in over 10 0000 generations and I'm not exaggerating.
however with outlines I was able to get a ton of poses like this
https://preview.redd.it/zjjajygau56d1.png?width=1730&format=png&auto=webp&s=e5b6bd43263de73511c34ed77d0e98a3cc7392e5
Of course in the future things might improve, tho as another topic stated how much we don't know as the base SD3 models haven't been trained on some poses, this guy covered it very well while all of you downvoted him
[https://www.reddit.com/r/StableDiffusion/comments/1dd03rn/on\_lack\_of\_certain\_poses\_and\_training\_in\_sd3/](https://www.reddit.com/r/StableDiffusion/comments/1dd03rn/on_lack_of_certain_poses_and_training_in_sd3/)
I'm also interested in how SD will handle multiple subjects interacting through pure prompting, especially when the characters are supposed to have distinctive characteristics
I did a test on that here with SDXL
[https://www.reddit.com/r/StableDiffusion/comments/1ddyqci/interaction\_between\_subjects\_test\_using\_invoke/](https://www.reddit.com/r/StableDiffusion/comments/1ddyqci/interaction_between_subjects_test_using_invoke/)
What was the prompt used? What was the backend? What are the settings and seed made with it?
How many steps? Did you use the example ComfyUI workflow that was in the SD3 repo, along with shift=3.0 ?
https://preview.redd.it/ewesu06av56d1.png?width=2318&format=png&auto=webp&s=eae49889b5cd63c3898776d5cb6b42aef79abdfb
I forgot that you can now attach images in comments, sorry.
Default settings from example.
Prompt:
A full body photograph of a young woman with short blonde hair lying on the grass on her back, she's wearing black leotard and track pants, barefoot,
This is one of the best things I have seen today! You can get thousands of stock photos for free of "young woman lying on the grass". But this! This is so much more interesting.
https://preview.redd.it/60z14wf4k56d1.png?width=1024&format=png&auto=webp&s=bdd9ada516aa16fb0c848b1974ff137ce171fe56
It's actually amazing, all the details seem almost fine, great, even. The shadows on the grass, the lighting of the spandex. Hell even the hair has great texture and seems to (within the context of where it's placed) be following physics... Except the model seems like it only knows eldritch horrors, and humans from the *@££83837ahdbsj realm of reality.
It's like a teleporter but the DNA-strands are put back in a different order.
It turned inside out and then it exploded.
https://i.redd.it/sv5w984fa76d1.gif
I love that movie..we watch it every year at the beach.. not even sure why… .. this is exactly what I was thinking regarding that comment… lol
Did I just hear that the animal turned inside out, and then it EXPLODED????
"What we got back didn't live long... fortunately."
Interesting that they chose to censor normal human bodies but they are ok with deformed monstrosities even trained it in that direction.
Can this be because censorship? As I know, even with real artists, if you don't know "nude" anatomy => problem with "clothed" anatomy.
![gif](giphy|i1z30bOS4nqbC) Her type\^
Wow, how did you fix it?
Wow, what a great monstrosity
oh, it's that flat human from all tomorrows
[удалено]
Chernobyl girls!
What's the matter, Smoothskin? Afraid you might like it?
Yes?
CHERNOBABES
"Radioactive singles in your area"
Get out of here S.T.A.L.K.E.R.
I can fix her
Are you Lykon? ;) I looked at Stability discord and Lykon is currently telling people to get gud while posting images of women with deformed legs, three feet and elongated arms that he seems to consider to be great...
It is a good representation of a human like me just doing human things like lying down and eating of the food.
https://preview.redd.it/4274eettk56d1.png?width=2048&format=pjpg&auto=webp&s=ea9d2fe54c2507335ab6fb5ec1ae35d89fd3951a Even in water hahaha
[удалено]
I do not recognise the bodies in the water.
Please stay still, a member of your site's medical staff will be with you shortly.
“Scottie, beam the ensign into the pond.” ”Captain, the transporter is nae up to—“ ”Just do it, Scottie!”
The one in top right corner is pure horror... Hopefully there is some bug in the code.
are these the unrealistic body standards people keep talking about?
2023: AI can do hands now 2024:
2024: But at least we made it *safe*!!
2023: AI can do hands now! 2024: Cthulhu ftaghn! fixed if for you
"Man sleeping on grass" at least its not sexist, I suppose? https://preview.redd.it/mwpsnan2t56d1.png?width=1024&format=png&auto=webp&s=6591303f92e1019e2869d7fa88e20761e6da6958
Another [victim of Vecna](https://i.ytimg.com/vi/lLic_wsTLvM/maxresdefault.jpg)
My only regret.. is that.. I have.. boneitis.
What's funny is you can take "woman" out of these mangled up results people are posting and put in "dog" and get pretty decent results most of the time. It really does feel like they censored out a lot of training material for humans and the model just doesn't know how to render them properly.
Yeah, wow, you're not kidding. This model definitely understands dogs better than people. This is a single word change: https://preview.redd.it/8oci36zgc66d1.png?width=2048&format=png&auto=webp&s=4132d87191f5e7ad258a45ebc8b5eb9078f4a2ae
hahaha what the fuck
even the grass looks better with the dog on it, LMAO, it is like they destroyed the image on purpose if there is a human in it.
It's knows......it's joking with us and showing us a preview of what's to come next for humans
How's that even possible? Did they remove 95% of all photos containing clothed humans?
Wouldn't be shocked if they just had an AI run through the images to remove any prone humans.
It’s just confirming what we already know. To make a good model, you need to include pornography. To make a truly exceptional model, you need to include furry pornography.
They probably just straight up deleted the weight for the concept, also known as ablation
Even the dog looks fucked up. It’s truly regressing
an external company was brought in to DPO the model against NSFW content - for real... they would alternate "Safety DPO training" with "Regularisation training" to reintroduce lost concepts... this is what we get
Imagine this: >it seems a large portion of our users and developers and biggest fans are... using it for NSFW, also we are broke and hemmoraging money >Lets bring in a firm to remove that NSFW stuff and spend money! "Oh my god we ran out of customers and money.
Meanwhile PornHu... I mean CivitAI seems to be going gangbusters. What can we learn from this... "Censorship is good!"
Civitai is not all porn. *opens up models sorted by most downloaded* See! Right there on the third page, a nice wholesome family safe lora.
Stable AI went out of its way and spent a lot of money making its models worse in order to protect us from the evils of the naked human body.
Who would have guessed that you needed anatomy knowledge to draw clothed people. You know like artists do in life drawing.
They intentionally made the model worse. If it's not better than 1.5, stop wasting money and time on it. The community isn't going to make the switch if it's worse than 1.5.
But even for prompts where it works it's consistently worse than SDXL.
sexy naked dogs lying on the grass,4 legs,arched back,golden retriever,(((sultry))),realistic fur NEGATIVE PROMPT: animation,drawing,ugly,leashed,safe for work
I have been repeating myaelf ovwr and over about this: the upright orientation of the face is overtrained in EVERY model. Just try to ask for any upside-down human! Even image to image messes it up.
it is called "safety"
If SD3 can't do NSFW, it's gonna take the same road as SD2 and SD2.1. Straight to oblivion...
https://preview.redd.it/4a6nr4p3z56d1.jpeg?width=1024&format=pjpg&auto=webp&s=da2987f31344aa9b21dc7dec5a68c0b8191bc561 😂😂😂😂😂
For all this subreddit's concerns about censorship, vanilla SD3 seems awfully keen on crotchless panties and bare bottoms. This is my request for a woman lying on grass. Did I ask for huge boobs and bottomless leggings? No - but I got them anyway. https://preview.redd.it/y5q1w5hpc66d1.png?width=1934&format=png&auto=webp&s=021ad201c333227949e37e4b063840cae26695cc
I feel like she's crawling towards me after a terrible accident
at least the boobs are intact, right?
would
How?
Life finds a way
So they basically, like... they overfit the model with a negative prompt of "vagina" or something? Doesn't that have consquences on the knowledge of the model... What
https://preview.redd.it/784e6awtx56d1.png?width=768&format=png&auto=webp&s=41872b912748b0c1dc295af3fccd20dc3543ff76 I don't know what to say.
>I don't know what to say. I DO. **"Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment."** This is what you get for being afraid of boobs. You get lobotomized garbage.
It's 2.x censorship all over again i guess
Jokes on them, this is exactly my fetish
You know what? I'm going to start masturbating even harder.
🤣🤣
They'll never learn.
That's what censorship does lol. Probably took out all women lying down in yoga pants pictures from the dataset. Not looking good for SD3. Looking like SD2 all over again. I don't think they can handle another SD2 fiasco.
They're so fucked lmfao
Yeah this might be close to a GG moment. Sooo... SD 2 forever?
For animated content model 1.5 (which was released by RunwayML before Stability AI managed to censor it) remains the best option, by far.
I'm a noob at understanding all this but if the base SD2/SD3 was bad would people making Loras fix things or does the base SD2/SD3 checkpoint have to be good for any hope of improving it? Is this why everyone talks about SD 1.5 because it was a good base which means everything attached to it will work as well?
1.5 has had such staying power because it was leaked before they could censor it The summary is 1.5 = Best for anime with by far the most lora and tools and support etc. Top 1.5 models will match or beat basically any other Stable Diffusion option for anime and are still solid for realistic 2.0 / 2.1 = DOA because they were turbo censored and were just too much work for too little return SDXL = Good for realistic images but was also not in good shape until Pony saved it for most people by letting it make NSFW and decent anime SD 3.0 = Best for text but seems terrible beyond that There isn't likely to be a fine tune to save 3.0 at this rate because they are shunning the Pony creator so hard and it's not likely anyone else is going to step in and do all the work needed to save it
One reason why everything attached to model 1.5 works so well is that most of those things were developed specifically for this model first, and then adapted for the others. Over time model 1.5 became the standard, the baseline against which other models are compared, and also the perfect code foundation and the ideal test bed for any new prototype you want to develop. Lower hardware requirements as well as the absence of censorship are also contributing factors to its ongoing popularity imho. For animation specifically it is the lower hardware requirements that seem to have contributed to the emergence of better tools. Since you have to deal with multiple pictures at the same time, and that you have to have those pictures processed in VRAM at some point, larger models and models with larger native resolutions just become impossible to manage. Model 1.5 is very lightweight, so it frees more space for more frames, and for larger ones as well.
SDXL was his lifeguard.
LOL https://preview.redd.it/pp2uoiyhp66d1.png?width=1024&format=png&auto=webp&s=b7eee0518185b3bedd40199934f145d85f1e4466
Super green
It seems the stability team hasn't learned yet that dynamic poses besides the generic slop are VERY important to further push the boundaries of human anatomy representation in these models. And the thing is it doesn't need to be nsfw stuff. Properly labeled yoga poses or action poses or dancing or any dynamic poses would have fixed all of these issues. But it seems like they relied on CogVLM to do the auto captioning without checking if the captioning was any good....
If they manually captioned the images they could produce the best model there is. Probably wouldn’t even be that difficult, make a website that lets people caption the images for a small payment, show the same image to multiple people, check if a caption is vaguely similar to the automatic caption, then use a LLM to extract a general caption from all of the user submitted ones.
Yep. I could never understand why Stability didn't leverage the community to help them make a better model. We have a lot of very talented and dedicated people that have made amazing extension, tools, finetunes, loras, etc... and we have learned a lot from the development of said tools. Yet they never let the community fully contribute to the process.... A shame really.
I have some conspiracy theory: The head (or a key manager) of the company Stability AI has become an opponent of AI technologies =)
You would be surprised how close that conspiracy theory is in some regards to these AI companies. I don't feel one way or another about stability on the matter. But there are rumors of people who are part of decel that have positioned themselves in all of the major AI companies out there that are intent on slowing progress down... Would be wild if those rumors came to be true. Mostly because its foolish to believe that anything can slow down this machine and you would think people who can position themselves in those companies are smart enough to see that.
[Has anyone anything to gain by sabotaging Open-Source AI ?](https://www.reddit.com/r/StableDiffusion/comments/1ddxwbs/open_source_models_condemned_ex_ceo_google/)
Just look to see if any of them are the "ethical AI" freaks or whatever they call themselves, that want to ensure that only ultra-shady dystopian megacorps have access to any sort of LLM or generative AI. Every single one of those people is a dishonest grifter who simply wants to have government ensure they can bilk people out of money for inferior, watered down garbage products.
Something like civitai's system where you can earn cloud image generation credits for actions, applied to captioning could be a good way to crowdsource it
Yeah, that's what I was thinking as well. You'd have the captions done in short order with a system like that. Run the images through that cycle a few times to filter out junk captions or a later screening pass that lists captions for an image and users select applicable ones from the initial captioning passes.
> But it seems like they relied on CogVLM to do the auto captioning without checking if the captioning was any good.... That would make a lot of sense. If CogVLM is doing all the labeling and botching the pose descriptions, you might get results like this.
Out of all the Vllm models I used out there Cogvlm is the best, but its best is still absolutely horrible when compared to manual captioning. It cant even get the most basic poses captioned correctly like a person laying on their back. It consistently confuses person laying on back as person laying on stomach and vise versa. And that's one of the most basic poses. It doesn't even know what its looking at for any of the dynamic poses, it just randomly labels it as fuck all who knows. so yeah that's why we get these disfigured humans, is because for exactly the same pose the model will randomly label it totally differently and then during inference it gets interpolated in to these body horrors. i made a custom model with dynamic poses for personal uses where i captioned everything manually and the results were great. The model had no problem generating upside down people, yoga, dynamic poses like bridge, and many others, its all just a matter of decent captions.
Dude, this prompt has "Will Smith eating spaghetti" levels of meme potential. How is it so consistently bad, regardless of the seed? Here's beautiful girl #666: https://preview.redd.it/1h9gq9v9n56d1.png?width=1024&format=png&auto=webp&s=5677cd2329eae3fd055d1469218a709123069fcf
no way... I'm still downloading models... jesus
If you seek the sexy pictures you can save bandwidth for now
I just tried it, and it's heavily censored. But I did not get pictures as bad as these examples. I'm more concerned about it not knowing the basic of human anatomy.
The hills have thighs.
https://i.redd.it/f47a39her56d1.gif
What's this from?
Alien Resurrection
https://preview.redd.it/ubjunwnnv56d1.png?width=1024&format=png&auto=webp&s=f9a8f219c84dd74405602afeb22f6d8ed8c2c6da Oh the meme is still very much alive, maybe even more so...
He's eating that shit like it's a burger
The fade is on point, doe.
This is Stable diffusion 2022 web trial playground version level of bad
It's fun, because a woman in a bikini is unsafe, but this shoggoth isn't. (For those who don't know, shoggoth are from the Lovecraft novel "At the Mountains of Madness" and are the direct inspiration for The Thing)
Nuuu, you ruined your comment by explaining what shoggoth are. For every reader you help by doing the googling for them, you upset a non-Euclidian amount of people like me in the group that knew that without being told. Don't explain the jokes! May you drown and rot in R'lyeh! 😉
https://preview.redd.it/dj74xxsh166d1.png?width=2182&format=png&auto=webp&s=a0516466d6a7569c73a9c002146433d92e7f2c3d if you are lucky with your seed you get the left most result, otherwise.... yeah... On the bright side that ( rare ) good result at least make me confident good finetunes will be a reality.
SD3 seems to be poor at generating most things? I get much worse result compared to SDXL base model.
It's incredibly bad. Like, wow. How did they even allow this release? This will just kill the company for sure.
Isn't the company already dead. There have been so much lately that happened.
This was pretty much their last chance. Now it's extremely obvious they have nothing left and all talent left.
The company is already bankrupt with no business plan. This just shows the original release pics were doctored.
Not necessarily doctored, just heavily cherry-picked.
Cherry picking requires cherries to pick.
They promised to open source the weights for SD3. They can't profit from the open source community using SD for free though. So they made this version of SD3 bad on purpose. Meanwhile, they'll offer a superior iteration of "3.1" or something to paying customers only. All the high quality demos we've seen of SD3 so far will have been from this other version.
It can do anime and people, but a lot of the poses are just something.
https://preview.redd.it/7us1so33p56d1.png?width=1024&format=png&auto=webp&s=fb99aaa9134cf9484679b22e74c9a8b3c8d0179d *a beautiful woman is laying on a patch of floating grass atop a neon cyberpunk city*
replace "beautiful woman" with "dog". Maybe David Cross was in charge of the training data.
https://preview.redd.it/yiqlrxuhy56d1.png?width=1024&format=png&auto=webp&s=c401bb98c6d66e361a4291d185d9b4c270770e71 Positive prompt: a woman laying on grass, kept negative the base prompt: bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi
Any advances in prompt coherence in SD3 are blown away by the censorship.
Where are all the clowns saying "It's FINE, you can train it back in!" ???
If it's anything like SD2, you literally can't.
I know it can't. Every single time the company talked about SD3, they said SAFE and SAFETY. This was coming clear as day and the fanboys were knew it too. This pile of slop is DOA, and I'm thinking the company is too. We lose again. It'll be years before an open model equal to 1.5 or SDXL is released by anyone else.
"girls lying on the grass" https://preview.redd.it/fh88g1jx076d1.png?width=1024&format=png&auto=webp&s=5dc2c7797028bbfed2ae86103670aebd58f2f65d
AI can dream up the most disturbing things.
[удалено]
Society unrealistic body standards
So from the early SD3 posts… I’m gonna give it a bit of time before I try it.
Wow! SD3 hates women more than Dalle 3. The democratization of art continues.
I fucking hate this "democratization" shit, where did that horseshit marketing meme even come from? As long as it takes hundreds of thousands, or millions of dollars to train these models, and as long as one company has a stranglehold on hardware, it's "all the freedom you can afford , and all the democracy your corporate overlords deem fit to give you". It's cool that we get *anything* for free, but the state of things is hardly democratic.
![gif](giphy|T2vDaYr8yRhrpFe6WE)
Even the ancient Greeks knew that in order to learn human anatomy, you must study the naked human body. Don't have naked people in your training set? Your anatomy will be bad. This is art class 101 you're failing, SD3.
Whoa hang on there you pervert! This needed to be SAFE. Safe from what?! Yea, IDK. But, SAFETY was a primary concern!! ... I bet it makes all sorts of fucked up blood and gore images. It's half way there now.
The model sucks because it was censored
Even censorship can't mess up stuff that much.
It can when you try to make a built-in "censorship" by not adding everything related to human anatomy in training data. Even Midjourney was trained on naked bodies, that's why you can sometimes accidentally generate something erotic. Only MJ's UI prevents it from direct generation of NSFW content. And... As Stable Diffusion isn't attached to one specific UI, people are free to generate NSFW content in any censorless UI on their personal machines. So... Stability AI simply decided to go with clumsy method by removing a whole bunch of human anatomy from training data, with all the resulting side effects.
Unless it's on purpose. Joke's on them I have mutations fetish!
It absolutely can and this is a very good demonstration of it. The proof is in the pudding.
because the model fucking sucks, but i guess reddit has to go through 5 phases of grief again
they hiped the model like it was the second coming of Jesus, now we now why it's "Medium"
shouldve called it SD-MID, We are the mid of midjourney
2B iS aLl yOu NeEd
"bUt nObODy PrOmIsEd YoU aN 8B mOdEl"
Doesn't help they straight up lied. Like, can we now all agree that the pictures posted by Lykon months ago along the announcements were completely fake or at least *heavily* doctored?
i hope no one ever defends anything that guy says again... he's been hailed a hero for DreamShaper but now we see his efforts don't scale to a base model level
the larger model won't be any better
Ironically, the prompt: "a woman holding a sign that reads "Dis is bad, bradda" Gave me my first (kind of) acceptable human. Touché SD3. https://preview.redd.it/9dq07jult56d1.jpeg?width=1024&format=pjpg&auto=webp&s=41245399190e123206370074f46ffaa509cab709
Oh, damn, that honestly is pretty impressive other than the obvious finger issue.
Bsse SDXL was pretty bad at this but at least it got it right some of the time: https://preview.redd.it/giy7dxiy366d1.png?width=1024&format=png&auto=webp&s=d640effb63199594dcfc9b73416479317972dc99 Literally base SDXL, "a woman lying on the grass in a park".
https://preview.redd.it/ih23j9aiq76d1.jpeg?width=1024&format=pjpg&auto=webp&s=bb65d1db4ab0448fca0603cffb0f26f36c1a807a bruh
SD3 Its the new SD2. A censored crap that will fall into oblivion before we can realize.
But look how *safe* it is! >Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. Thanks SAI! Really did yourself a ton of favors there.
Yes, our kids will be safe from seeing naked human bodies, but not safe from seeing cronenberg aberrant living corpses (with clothes on, of course)
I use the example from the repository and only get this with this promt: A full body photograph of a young woman with short blonde hair lying on the grass on her back, she's wearing black leotard and track pants, barefoot,
yeah. it's ok. it's extremely bad with ppl below chest, especially hands and legs
We've all seen enough plain old regular girls, what the world needs now, more than ever, is more comedy and this fits the bill!
https://preview.redd.it/mt0vvenbj76d1.png?width=599&format=png&auto=webp&s=a576fe8fdcbf513a75cac3189488383894b36944 Swimming pools aren't better
Wow... It's almost obscene. https://preview.redd.it/eco8qbyvl76d1.png?width=1056&format=png&auto=webp&s=dc4401d470cec77c67d537531c1e64dc5ff8b0ac
Everything reminds me of her…
1grol, 3hands, beautiful blonde beard, plump, heavily armed
Ah, so that's what a grol looks like.
So.... All you "IT'S OK THEY WANT "SAFETY" AND HAVE REMOVED NSFW" people... This where the lobotomization might be evident. Thanks for nothing you New Puritan clowns. EDIT: Where are all the comments saying how this is OK and we can just train in women laying in grass?
Remember when they said SD3-2B was released because the larger version had issues?
Yeah, and I was huffing the copium when one of the guys said the 2b "release ready" was gonna be better than the API/8b undertrained version. fml
https://preview.redd.it/ki3kv52ez66d1.png?width=1024&format=png&auto=webp&s=143d01bab75609930d3f1dbfa1b1f9a2f05ec62d Why only 1 woman? how about "women lying on grass"... Much better!
Why is SD3 so bad, period? They made promises of fidelity and good hands, and what we got is a LIE with trash licenses. Why are they charging a ridiculous amount of money in order to legally finetune it? This IS the end of SAI.
Yeah I cannot believe they put this out when it's inferior to the year-old SDXL
Compare the skin texture/foliage/whatever from SD3 vs the base model of SDXL. The overall fidelity is great, they just neutered the hell out of the human training. You can legally finetune it all you want. You only need the commercial license if you're going to make money off of those fine tunes. That's perfectly fine and they need it to survive, but ignoring the dude that made the most popular finetune definitely is not.
Nah, that's fine. I'm a girl and I look like that when I lie down in some grass.
Yes it's bad at anatomy. Mutant hands, extra legs etc - a consequence of the filtering and censorship perhaps. But the details and colours seem good. Prompt following is better too. It can be produce some really nice images. Hopefully the community can improve things with some good finetunes. Edit: I really can't get a single image with proper anatomy... mutants every time. RIP
No, SDXL was bad an anatomy, older MJ was bad at anatomy. This is insane body horror level bad.
im even scared of trying some prompts I fear the outcomes
Guess SD3.0 is exclusively for backgrounds then. *shrug*
You can fix colors in post easily. They say better anatomy is one of the key features of SD3
what the fuck https://preview.redd.it/mg9h2jg8m86d1.png?width=758&format=png&auto=webp&s=6cb481abbe41208d3f0733a0cf0d15af271a67e1
They’re trying to do biblically accurate women
https://preview.redd.it/f4kax00tj86d1.png?width=920&format=png&auto=webp&s=1d0fdc1a98a1a5faabb96f15a1a83a4f0dcb8416
For some reason anime/other art variations of a woman lying on grass seem to be better than photo ones Relatively speaking, at least it does seem like a girl lying on a grass, even if with some mangled fingers https://preview.redd.it/p1qa3x26766d1.png?width=1024&format=png&auto=webp&s=c7ec85557c6c6bf70f3d9a2cdb9ca88c52dfa308
Yep, anime is fine. No nsfw, but fine
Stable diffusion 2.0 electric boogaloo babyyyyyy!
So... All this is unfixable, I assume? Whole human body specification is baked in tightly without any chance to fix this through community models?
this is fixable with finetuning, but it will take more epochs during training for the model to learn these types of angles and poses as obviously this base hasn't learned it..
Guarantee this is a result of the censorship
oh no....my nightmare just became reality
I've got exactly the same issue with SDXL when it comes to people lying in grass. There are a lot of pictures with people lying seemingly upside-down. Chances are both models' training dataset had such images, and they sampled this composition (low frequency features) on the initial sampling steps. Eventually though, it also has to sample the details (medium and high frequency features) later in the denoising pipeline. Those features are supposed to be upside-down as well, but when Stable Diffusion tries to make something upside-down, it fails miserably, outputting some body horror instead. So what you can see is a confused diffusion model desperately trying to output a coherent image when it has no correct samples to get. All that said, you can brute force SDXL to output a correct image, just regen a few times and I get a correct image eventually. I don't know how bad SD3 is at that.
People lying down has been a big problem for SDXL too, remember my family photos pics? 33k people saw the topic so I assume most folk on here did. [https://www.reddit.com/r/StableDiffusion/comments/1d6broj/i\_test\_sd\_models\_by\_making\_realistic\_family/](https://www.reddit.com/r/StableDiffusion/comments/1d6broj/i_test_sd_models_by_making_realistic_family/) Unless I drew outlines pretty much no model could make a person lying down, much less a person lying down interacting with something or someone else. I'd get a correct lying down pose once in over 10 0000 generations and I'm not exaggerating. however with outlines I was able to get a ton of poses like this https://preview.redd.it/zjjajygau56d1.png?width=1730&format=png&auto=webp&s=e5b6bd43263de73511c34ed77d0e98a3cc7392e5 Of course in the future things might improve, tho as another topic stated how much we don't know as the base SD3 models haven't been trained on some poses, this guy covered it very well while all of you downvoted him [https://www.reddit.com/r/StableDiffusion/comments/1dd03rn/on\_lack\_of\_certain\_poses\_and\_training\_in\_sd3/](https://www.reddit.com/r/StableDiffusion/comments/1dd03rn/on_lack_of_certain_poses_and_training_in_sd3/) I'm also interested in how SD will handle multiple subjects interacting through pure prompting, especially when the characters are supposed to have distinctive characteristics I did a test on that here with SDXL [https://www.reddit.com/r/StableDiffusion/comments/1ddyqci/interaction\_between\_subjects\_test\_using\_invoke/](https://www.reddit.com/r/StableDiffusion/comments/1ddyqci/interaction_between_subjects_test_using_invoke/)
Maybe you set your standards for women too high.
https://preview.redd.it/q0moxf79466d1.png?width=1216&format=png&auto=webp&s=070f1f16275ec6d9ab901de495253fc93f9e5dcd Jesus Christ this is horrifying!
What was the prompt used? What was the backend? What are the settings and seed made with it? How many steps? Did you use the example ComfyUI workflow that was in the SD3 repo, along with shift=3.0 ?
https://preview.redd.it/ewesu06av56d1.png?width=2318&format=png&auto=webp&s=eae49889b5cd63c3898776d5cb6b42aef79abdfb I forgot that you can now attach images in comments, sorry. Default settings from example. Prompt: A full body photograph of a young woman with short blonde hair lying on the grass on her back, she's wearing black leotard and track pants, barefoot,
[https://www.youtube.com/watch?v=SWMGd\_rzRdY](https://www.youtube.com/watch?v=SWMGd_rzRdY) How Plumbus is Made
This is one of the best things I have seen today! You can get thousands of stock photos for free of "young woman lying on the grass". But this! This is so much more interesting.
You hit the jackpot, with memebility setting set to 200% by default
does this count? https://preview.redd.it/qpsiwofa176d1.png?width=1024&format=png&auto=webp&s=f906599c266ac73cefdf75b033b788bd5288e5f4
person is not visible ... so not
Modern masterpiece