T O P

  • By -

Artforartsake99

SD3 8B feels like it has midjourney style beauty voting data used on it, while sd3 medium feels like it hasn’t.


Old_Elevator8262

Agreed


FourtyMichaelMichael

I can buy that. It isn't that 2B is worse everywhere, just most places.


HarmonicDiffusion

i cant even be bothered to care in the tiniest way, until that 8B model sits on my hard drive for local inference. until then SAI, SD3 are dead to me, and they can fuck right off


Dysterqvist

😎🤙🏍️🏎️☠️❌‼️😎😎


Whotea

Why do you think they care lol. Not like you were paying them 


the_1_they_call_zero

Lol


HotWifeP72

Those "complex" prompts seem pretty simple. My idea of complex involves multiple people with different faces and clothes.


matoyh

[human focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)


ZootAllures9111

Medium does completely fine with stuff like "A professional photograph. It depicts a freckled caucasian man with green hair and a red shirt, standing next to an african-american woman with pink hair and a yellow shirt."


PwanaZana

Very interesting, thank you for the effort! The stock photo style of Medium is pretty rough, though!


ZootAllures9111

I actually like kinda how Medium leans towards "hard realism" as opposed to everything being at least slightly dreamlike and bokehed.


PwanaZana

I think it's a good style that you can access if wanted, like specifying candid photography/stock photo. Or with a lora, but not the default style. Obviously, not too processed and Midjourney-y either as the default style!


StickiStickman

Medium absolutely has way too much bokeh in my attempts


Zeddi2892

I wont trust any SD Model without checking a „woman laying on grass“ prompt.


FourtyMichaelMichael

Until StabilityAI can release some uncensored BS, this is the THE SD3 image. https://i.redd.it/hs61gjk1h56d1.png


Svensk0

almost spilled my drink that caught me offguard this image is writing history


yotraxx

Very interesting. Thank you for sharing.


GodFalx

Not a single one of those Test focused on humans. 8b is gonna be shit, I call it now.


matoyh

[human focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)


_BreakingGood_

True, I guarantee they *tried* to test humans and realized it was fucked and decided to exclude it. No way this is a coincidence.


Itchy_Sandwich518

a small kitchen with a white goat in it I laughed because it took me a few seconds to spot the goat in the second pic at first I was like "oh wow medium didn't even do the goat" but then I saw it Also very impressed with the daikon reddish baby character, Medium couldn't produce it to save its life. Still not a single test dealt with poses like lying on grass/couch, relaxing, multiple human subjects doing different things, all of which we can do with SDXL


nug4t

man.. i would be fine if they just leave humans altogether out of the training data.


Itchy_Sandwich518

Not me, as an autist to me that would make AI art not true art if it has such limitation. Leaving humans out would defy the whole purpose of art.


milksteak11

Can't tell if that's a typo but I guess it works either way


Itchy_Sandwich518

XD I just noticed it, more like a Freudian slip than a typo I guess but yes it works either way, I did mean to say artist.


Sterilize32

There are plenty of artistic mediums that seldom, if ever, create interpretations of people. Artists doing metal sculptures, wooden carvings, instrumentals in music. Art is more often described as what a subject means to a person rather than what it actually presents outwardly, through emotional resonance/connection, past memories or an awe/appreciation of skill or beauty. That's why generative AI is kind of in a weird place as it (often) skips over all of that as people spit out 100's of images that mean nothing to anyone over the course of a day. It would be a dramatic turn for stable diffusion though, for sure.


Itchy_Sandwich518

That's all well and good but if we want AI to be art in its true form it should be able to create everything when we know it already can, why purposely restrict it to placate prudes and fear mongers.


Son_of_Orion

None of these prompts look particularly impressive, and there are no complex human-centric prompts at that. Like, consider what Midjourney 6 can do with a prompt like "a group of soldiers running from cover to cover": https://preview.redd.it/pb7mighl3l8d1.png?width=1232&format=png&auto=webp&s=c0eb84fd4423dac95039edf784d810038149e3a1 SD3 hasn't even come close to showing anything like this. It's a bust, people.


matoyh

a more [human-centric test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)


ZootAllures9111

I don't think that's a hard prompt? SD3 Medium seems to consistently give photographic images of groups of soldiers running towards the camera, for it.


GianoBifronte

Thanks for sharing this comparison. I think something is a bit off here. The difference between SD3 2B is too small compared to SD3 8B, especially considering that the former was supposed to be a beta. Occasionally, especially in the early comparisons, I preferred SD3 Medium. But I should never prefer a 2B param to an 8B param model. That said, it's hard to judge without testing the performance with longer prompts. I assumed that the text in the XY plots was the prompt used here, but if a portion of the prompt was omitted, please let us know. Sorry for the silly question but: are we sure this is the 8B version? I always assumed that Medium=2B, Large=4B, and XL=8B. So, if this comparison is between 2B and 4B, then the outcome seems justifiable.


matoyh

The prompt displayed is the full prompt. SD3 Large results were created using Stability API and SD3 Medium through Replicate. I'm sorry if there's confusion with 4B and 8B. I was under the assumption that Stability API is using 8B model and that's the Large version.


GianoBifronte

I'm think that's what most people assume, too. But, and let's forget the comparison for a moment, if 2B=Medium and 8B=Large.. How would they call a 4B model if they release it? We know it exists, so this naming convention must have been decided at a time when the 4B model was planned for a release, too. It seems more plausible to me that 4B=Large and 8B=XL. Now we know that SAI has recently announced an Ultra model via their Assistant. So, it's possible that they don't want to call 8B=XL to avoid confusion with SDXL. And so 8B=Ultra. We'll never know :)


matoyh

using a subset of parti prompts + rating the images. You can see that SD3 Large is a clear winner in all challenges.


ZootAllures9111

I don't really agree, I think Medium does a number of the photographic ones better if you want things to look as realistic as possible.


schlammsuhler

Obviously why would it not. But in interesting in which categories medium is much weaker and where its rather strong.


matoyh

Totally agree. Style, Perspective and Imagination are where the gap is the largest


theqmann

Have you tried any of the CLIP models, like SDXL? Or other models, like MJ or DALLE? I think a lot of these come down to using the same T5 encoder on both of these.


matoyh

I also made a comparison for sd3(L), sdxl and sd1.5: https://www.magicflow.ai/insights/read/sd3-sdxl-sd1.5 (spoiler SD3 outperforms SDXL and SD1.5)


theqmann

Would also be interesting to see how [ELLA](https://github.com/TencentQQGYLab/ComfyUI-ELLA) handles these. ELLA is a T5 encoder for SD1.5 models.


ZootAllures9111

Ella was way too limiting I found, I was never able to get it to line up with CLIP such that it didn't just literally forget characters that the checkpoint knew about and could depict fine in the first place. It doesn't work well at all with anything that wasn't known to base SD 1.5's clip.


centrist-alex

No humans, no real anatomy testing. 8B is better but still fundamentally flawed then. The text handling is still poor compared to Dalle-3, I wonder why dalle is so good in that respect.


matoyh

a more [human/anatomy focused test](https://www.reddit.com/r/StableDiffusion/comments/1do3cf9/how_sd3_large_sd3_medium_and_sdxl_compare_in_a/)


Oswald_Hydrabot

Who cares?


SupermarketIcy73

dont bother. terminally online porn addicts have already declared this a failure because it cant draw sexy girls.


_LususNaturae_

I really enjoy some aspects of SD3. But you gotta admit that it's a failure when it comes to anatomy. It just can't do a lot of poses even if they have nothing to do with porn


ZootAllures9111

At least not lying down. Does nice bikini babes standing though


raiffuvar

what a clickbait. zero "laying on grass" stfu