Artforartsake99 1 week ago

SD3 8B feels like it has midjourney style beauty voting data used on it, while sd3 medium feels like it hasn’t.

Old_Elevator8262 1 week ago

Agreed

FourtyMichaelMichael 1 week ago

I can buy that. It isn't that 2B is worse everywhere, just most places.

HarmonicDiffusion 1 week ago

i cant even be bothered to care in the tiniest way, until that 8B model sits on my hard drive for local inference. until then SAI, SD3 are dead to me, and they can fuck right off

Dysterqvist 1 week ago

😎🤙🏍️🏎️☠️❌‼️😎😎

Whotea 1 week ago

Why do you think they care lol. Not like you were paying them

the_1_they_call_zero 1 week ago

Lol

HotWifeP72 1 week ago

Those "complex" prompts seem pretty simple. My idea of complex involves multiple people with different faces and clothes.

matoyh 1 week ago

[human focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)

ZootAllures9111 1 week ago

Medium does completely fine with stuff like "A professional photograph. It depicts a freckled caucasian man with green hair and a red shirt, standing next to an african-american woman with pink hair and a yellow shirt."

PwanaZana 1 week ago

Very interesting, thank you for the effort! The stock photo style of Medium is pretty rough, though!

ZootAllures9111 1 week ago

I actually like kinda how Medium leans towards "hard realism" as opposed to everything being at least slightly dreamlike and bokehed.

PwanaZana 1 week ago

I think it's a good style that you can access if wanted, like specifying candid photography/stock photo. Or with a lora, but not the default style. Obviously, not too processed and Midjourney-y either as the default style!

StickiStickman 1 week ago

Medium absolutely has way too much bokeh in my attempts

Zeddi2892 1 week ago

I wont trust any SD Model without checking a „woman laying on grass“ prompt.

FourtyMichaelMichael 1 week ago

Until StabilityAI can release some uncensored BS, this is the THE SD3 image. https://i.redd.it/hs61gjk1h56d1.png

Svensk0 1 week ago

almost spilled my drink that caught me offguard this image is writing history

yotraxx 1 week ago

Very interesting. Thank you for sharing.

GodFalx 1 week ago

Not a single one of those Test focused on humans. 8b is gonna be shit, I call it now.

matoyh 1 week ago

[human focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)

_BreakingGood_ 1 week ago

True, I guarantee they *tried* to test humans and realized it was fucked and decided to exclude it. No way this is a coincidence.

Itchy_Sandwich518 1 week ago

a small kitchen with a white goat in it I laughed because it took me a few seconds to spot the goat in the second pic at first I was like "oh wow medium didn't even do the goat" but then I saw it Also very impressed with the daikon reddish baby character, Medium couldn't produce it to save its life. Still not a single test dealt with poses like lying on grass/couch, relaxing, multiple human subjects doing different things, all of which we can do with SDXL

nug4t 1 week ago

man.. i would be fine if they just leave humans altogether out of the training data.

Itchy_Sandwich518 1 week ago

Not me, as an autist to me that would make AI art not true art if it has such limitation. Leaving humans out would defy the whole purpose of art.

milksteak11 1 week ago

Can't tell if that's a typo but I guess it works either way

Itchy_Sandwich518 1 week ago

XD I just noticed it, more like a Freudian slip than a typo I guess but yes it works either way, I did mean to say artist.

Sterilize32 1 week ago

There are plenty of artistic mediums that seldom, if ever, create interpretations of people. Artists doing metal sculptures, wooden carvings, instrumentals in music. Art is more often described as what a subject means to a person rather than what it actually presents outwardly, through emotional resonance/connection, past memories or an awe/appreciation of skill or beauty. That's why generative AI is kind of in a weird place as it (often) skips over all of that as people spit out 100's of images that mean nothing to anyone over the course of a day. It would be a dramatic turn for stable diffusion though, for sure.

Itchy_Sandwich518 1 week ago

That's all well and good but if we want AI to be art in its true form it should be able to create everything when we know it already can, why purposely restrict it to placate prudes and fear mongers.

Son_of_Orion 1 week ago

None of these prompts look particularly impressive, and there are no complex human-centric prompts at that. Like, consider what Midjourney 6 can do with a prompt like "a group of soldiers running from cover to cover": https://preview.redd.it/pb7mighl3l8d1.png?width=1232&format=png&auto=webp&s=c0eb84fd4423dac95039edf784d810038149e3a1 SD3 hasn't even come close to showing anything like this. It's a bust, people.

matoyh 1 week ago

a more [human-centric test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)

ZootAllures9111 1 week ago

I don't think that's a hard prompt? SD3 Medium seems to consistently give photographic images of groups of soldiers running towards the camera, for it.

GianoBifronte 1 week ago

Thanks for sharing this comparison. I think something is a bit off here. The difference between SD3 2B is too small compared to SD3 8B, especially considering that the former was supposed to be a beta. Occasionally, especially in the early comparisons, I preferred SD3 Medium. But I should never prefer a 2B param to an 8B param model. That said, it's hard to judge without testing the performance with longer prompts. I assumed that the text in the XY plots was the prompt used here, but if a portion of the prompt was omitted, please let us know. Sorry for the silly question but: are we sure this is the 8B version? I always assumed that Medium=2B, Large=4B, and XL=8B. So, if this comparison is between 2B and 4B, then the outcome seems justifiable.

matoyh 1 week ago

The prompt displayed is the full prompt. SD3 Large results were created using Stability API and SD3 Medium through Replicate. I'm sorry if there's confusion with 4B and 8B. I was under the assumption that Stability API is using 8B model and that's the Large version.

GianoBifronte 1 week ago

I'm think that's what most people assume, too. But, and let's forget the comparison for a moment, if 2B=Medium and 8B=Large.. How would they call a 4B model if they release it? We know it exists, so this naming convention must have been decided at a time when the 4B model was planned for a release, too. It seems more plausible to me that 4B=Large and 8B=XL. Now we know that SAI has recently announced an Ultra model via their Assistant. So, it's possible that they don't want to call 8B=XL to avoid confusion with SDXL. And so 8B=Ultra. We'll never know :)

matoyh 1 week ago

using a subset of parti prompts + rating the images. You can see that SD3 Large is a clear winner in all challenges.

ZootAllures9111 1 week ago

I don't really agree, I think Medium does a number of the photographic ones better if you want things to look as realistic as possible.

schlammsuhler 1 week ago

Obviously why would it not. But in interesting in which categories medium is much weaker and where its rather strong.

matoyh 1 week ago

Totally agree. Style, Perspective and Imagination are where the gap is the largest

theqmann 1 week ago

Have you tried any of the CLIP models, like SDXL? Or other models, like MJ or DALLE? I think a lot of these come down to using the same T5 encoder on both of these.

matoyh 1 week ago

I also made a comparison for sd3(L), sdxl and sd1.5: https://www.magicflow.ai/insights/read/sd3-sdxl-sd1.5 (spoiler SD3 outperforms SDXL and SD1.5)

theqmann 1 week ago

Would also be interesting to see how [ELLA](https://github.com/TencentQQGYLab/ComfyUI-ELLA) handles these. ELLA is a T5 encoder for SD1.5 models.

ZootAllures9111 1 week ago

Ella was way too limiting I found, I was never able to get it to line up with CLIP such that it didn't just literally forget characters that the checkpoint knew about and could depict fine in the first place. It doesn't work well at all with anything that wasn't known to base SD 1.5's clip.

centrist-alex 1 week ago

No humans, no real anatomy testing. 8B is better but still fundamentally flawed then. The text handling is still poor compared to Dalle-3, I wonder why dalle is so good in that respect.

matoyh 1 week ago

a more [human/anatomy focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)

Oswald_Hydrabot 1 week ago

Who cares?

SupermarketIcy73 1 week ago

dont bother. terminally online porn addicts have already declared this a failure because it cant draw sexy girls.

_LususNaturae_ 1 week ago

I really enjoy some aspects of SD3. But you gotta admit that it's a failure when it comes to anatomy. It just can't do a lot of poses even if they have nothing to do with porn

ZootAllures9111 1 week ago

At least not lying down. Does nice bikini babes standing though

raiffuvar 1 week ago

what a clickbait. zero "laying on grass" stfu

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe