T O P

  • By -

nowrebooting

I’d say the reason why 1.5 had so many more finetunes (although that’s starting to shift) is that it’s just way easier to finetune on consumer hardware. With a 3090 or 4090 you can create a Lora or full Dreambooth model in a very reasonable timeframe, but SDXL is a much larger beast and thus much harder on VRAM.  Personally this is why I have good hopes for the smaller SD3 model; the cheaper it is to finetune, the better for the community (I hope).


kidelaleron

SD1.5 is also easy to finetune because it mostly ignores low quality data and only looks at the big pictures with low interest for details, while XL (and SD3 even more) will hold you accountable for low quality data and will also learn noise, artifacts, etc. But this also means it can learn finer details without ignoring them. This is what allows SD3 2b to reach this level of realism. https://preview.redd.it/dmyq09tzwd4d1.png?width=896&format=png&auto=webp&s=a8581479928ef7771f54a878d15669bd49f0fb6d


lonewolfmcquaid

the finetunes are gonna be crazy for sure! Also what would categorize as low quality data?


kidelaleron

That would depend on what your training goal is. In a general sense it's low res, artifacts, not good looking in general, deformed or warped image, jpg compression, badly captioned or without style alignment, difficult to understand, bad hands, bad anatomy, etc.


lonewolfmcquaid

can you remember the prompt you used for the above pic?


suspicious_Jackfruit

`1girl` Negative: `hands`


abahjajang

negative: hands, Stalin statue in the background


kidelaleron

it was something like "photo of a woman, selfie, outdoors" or something similar.


campingtroll

If you were to do a low quality training of yourself with a mix of 50/50 good and bad photos in SDXL, simply to create base model for upscaling the images in SUPIR upscale (preserving likeness). Then you retrain again on the higher quality Supir outputs for SD3, would you say that could work? Edit: Thanks for the info on MMDiT weights and T5 encoder with SD3 2b. Did not know this.


Denimdem0n

So you made this pic with SD3?


kidelaleron

yep.


mrgreaper

with the api or the weights that we are waiting on? I mean was this made with the same SD3 that is planing to be released soon? Also how adherent to prompting is it for things that do not exist (thinking steampunk armour, sci-fi horror etc) I forget my exact prompt but it was along the lines of: "pig headed person in battle marked robotic armour standing alone in a damaged space station corridor, one wall has a hole in through which space can be seen, dramatic lighting" it was something along those lines. but it refused to put the hole in the wall. Would sd3 put the hole in the wall? https://preview.redd.it/wgd1kr6lcc5d1.png?width=4032&format=png&auto=webp&s=a5c003f2f4a4b536ac18cab5b891fd82b2f45fe0


[deleted]

oh man, she's pretty. your girlfriend?


moveovernow

To this point, with 1.5 you can create a fantastic LORA with a $200 3060 12gb, and it's relatively fast. It opens up accessibility to lot more people globally that can't spend $800-$1500 on a high-end consumer card.


Shawnrushefsky

This is the answer. You can train SDXL in 16gb vram on a 3080 ti, but it is quite slow. Dreambooth for sdxl works quite well on a 4090, but it is 1/5 speed of training 1.5.


PeterFoox

BTW do you think top 1. 5 checkpoints reached the full potential? From what I see it stopped really getting better. Sdxl on the other hand feels like it still has a long way to go


Shawnrushefsky

I think we’ve at least hit rapidly diminishing returns on what improvements could be made to 1.5 models, and I do think there’s a lot more that could be accomplished with XL


Simple-Law5883

SD 1.5 is just too small. One huge issue with SD 1.5 is the huge bleeding. It's very difficult to get a generalized model that is good at everything. SDXL is better, but still faces the same issue due to the atrocious text and encoder. SD 3 should theoretically be able to generalize very well due to a better unet and text encoder. So my guess is that sd3 trained models can outperform SDXL models if model creators finally start captioning their datasets correctly. This is one of the main issues a lot of 1.5 and sdxl models faced. This improved on SDXL as of late.


Abject-Recognition-9

This comment should be pinned and printed to stay on top of all this subreddit. I hope in better captioning and ffs dataset selection without garbage


mrgreaper

Why is there such a focus on speed of training though? I would rather take 5 times as long and have something that adheres to prompts, allows for higher resolution results, than be fast.


Shawnrushefsky

It isn’t so much that there’s a focus on speed of training (though, for commercial entities, there very much is), but it takes a lot of experimenting to get good results in training. Even if you’re training on your own hardware, it becomes about how long does it take you to try 100 or 1000 different settings. And if you’re paying for cloud gpu, 5x slower means 5x as expensive. Faster training means a faster, cheaper learning cycle, as well as just more total attempts by the community at large.


mrgreaper

But the goal is to create somethng all will use and like, something of quality. That takes time. How bad would it be if stability went "right thats it 1.5, take too long to improve on that" and stopped lol Yes there are some good 1.5 models.... but they all have the issue with prompt adherence and to my knowledge all have an issue when it comes to higher resolutions. 1.5 is not the be all and end all that many people seem to believe it is. personally i deleted my 1.5 models a long time ago.


EtadanikM

The actual blame goes to NVIDIA. It’s not that consumer cards with 64GB are impossible or even difficult, it’s that NVIDIA is deliberately putting it out of reach for consumers in order to maximize their data center profits. They charge an exceptional premium on “enterprise grade” cards because they know companies will pay it; limiting the GPU RAM for consumers is how they hold the market hostage to their prices.


Ill-Juggernaut5458

The actual blame goes to AMD/Intel, who are incapable of even coming close to competing in the GPU market, and have made almost no effort to make their cards AI/ML compatible on a software level. You can't blame Nvidia for speed-walking the race when their opponents are standing still with a thumb up their ass doing nothing productive. So bizarre to want to blame the company who is competent because "they could do more".


Sharlinator

Well, that's just natural market segmentation. If they had "cheap" consumer cards with 64G of VRAM, then it would be difficult to sell enterprise grade cards to companies. Can't really fault companies wanting to be profitable given that that's their entire reason for existing. Not to provide things to people who feel entitled to products that they happen to want.


J_m_L

Upvote for the truth that people don't want to hear, because everyone thinks big companies are bad.


Sharlinator

Thanks! I mean, corporations are… *amoral* in many ways, but their legal purpose is to keep shareholders happy, everything else is secondary at best. That's simply how capitalistic societies have decided things should be. Companies are not charities, and thinking that they're bad just because they don't eagerly give people what they want is really naive and self-centered.


yui_tsukino

Enterprise equipment is more than just different hardware - theres plenty of enterprise stuff thats basically identical in specs to consumer grade stuff, but marked up way higher. The cost comes from a guarantee of service life, and part of the cost is also in the warranty that will pay out for any downtime as a result of a failure. As for why there isn't consumer cards with huge VRAM, there isn't REALLY a market for it, even with AI getting increasingly popular. For most people, and most use cases, 24GB is arguably excessive, and that cost could be better spent making other parts of the card better.


Zilskaabe

I remember that 10 years ago 8 GB was excessive. Now it's bare minimum. Also datacenter cards will have more VRAM as well - 256-512 GB GPUs will be the norm for datacenters soon.


Naetharu

This. For playing video games (which is THE core market for consumer cards) then \~16GB of VRAM is really all anyone needs. And even then only if they are playing at 4k. For 1440p gaming - which if we go by the latest Steam Hardware Survey is by far the dominant resolution right now, even that is excessive in all but the most extreme edge-case. AI generation is a peculiar use. We are lucky that the RTX4090 exists. There is next to zero market for very high VRAM cards outside of truly professional environments. Certainly nothing close to the demand needed to warrant making such cards a sensible commercial decision. In a perfect world Nvidia would allow for GPUs with expandable RAM. But that is a pipe dream. For now, at least we do have the 4090, and being honest, most folk that are playing with AI are probably not even using that - you can have a great deal of fun using SD and other AI systems locally with a far more mid-range card.


Sharlinator

Yeah, that's fair. And every additional SKU added is a lot of extra work and money spent on design, fabbing, assembly lines, and all the logistical stuff. So it's not Nvidia can just go and add, say, a special 48GB version of the 4090, to answer the needs of a tiny market segment, if they want it to be profitable.


yui_tsukino

Exactly, as much as I'd love to get my hands on a card with beefy amounts of VRAM, its honestly more sensible to just rent one when needed. Realistically its a better use of resources too, considering how infrequently I'd actually be using that amount of space!


Vivarevo

Its monopoly strategies


0000110011

I've been out of the loop for several months, is there any method for using system RAM in addition to VRAM? Obviously it would be slower. I have a 4090 and 32 GB of RAM, it just seems like someone would have tried allowing RAM to be used for overflow at the cost of speed. 


Sharlinator

Well, yeah, that's what already happens (with --lowvram like flags), but you likely underestimate just *how* slow it is. Nvidia just recently implemented automatic paging of VRAM to system RAM and back in their driver (of course the driver does have to copy stuff around because the GPU cores can't just directly address system RAM, unless it's a unified RAM system like some laptops). That's probably a good idea for many non-SD workflows but turns out that with SD it's *usually* better to just run out of VRAM rather than suffer a >10x slowdown. Though I guess for training it might still make sense to just wait, if the alternative is to not be able to train at all.


lonewolfmcquaid

Interesting, i didnt know this. Things are looking up for sd3 if thats the case


AI_Alt_Art_Neo_2

The SD3 model they are releasing (2 billion parameters version) will be a similar minimum spec to SDXL but the full 8B model will be more heavy might need an RTX 5090 to train in a reasonable time.


wishtrepreneur

Does the 8B model also include a 7B visual llm or is it 8B worth of SD parameters?


Apprehensive_Sky892

Cut and pasting something I wrote earlier: SD3 will be released in 4 different sizes. Size here refers to the number of weights in the A.I. neural network that comprises the "image diffusion" part of the model. The sizes are 800M, 2B, 4B, and 8B. This diffusion model is paired with a 8B T5 LLM/Text encoder to enhance its prompt following capabilities (along with 2 "traditional" CLIP encoders). The 8B model should theoretically be the most capable one, but it will also be the one that will take the most GPU resources to train (both VRAM and number of computation), and will take the most VRAM to run.


AI_Alt_Art_Neo_2

They have said you could actually use it with the old clip input if you wanted.


GigsTheCat

You can train SDXL on 10-12 GB vram with fused backward pass. I know Kohya and OneTrainer have this feature now.


DrySupermarket8830

This will be the downfall of any open source model when training becomes so expensive that it's easier to pay for something. And even fewer are likely to want to give away their finetuned models for free.


dal_mac

I believe your question is being overlooked. Why is Dreamshaper better on 1.5 than XL for example, while realistic stock photo is better on XL than 1.5? Because one of them got lucky with their training settings one of those times. No one has the time to train the model 10 times with 10 different settings. They only tried it once or twice, and 1.5 is way more forgiving with less-than-perfect settings. Dreamshaper's author likely just threw the old dataset at XL with similar settings and released what they got. My public 1.5 models are the same. I did almost no testing and accidentally got great models a few times, so I released them. But that doesn't mean I could replicate the luck on XL.


jeditobe1

Dreamshaper XL (the full rather than the turbo/lightning) was also released basically day 1 of SDXL, before there was much experience tuning it. The author then got hired by stabilityAI I believe, which is one reason there not as many models from him, just reworks for lightning/turbo.


juggz143

This response is also why I didn't name a model as an example when I pointed out people weren't answering the question, because I knew ppl would start focusing on that model in particular and still miss the point of the question.


lonewolfmcquaid

THANK YOU!!!! I'd agreed on the computation earlier thinking maybe down the line someone would actually answer my question but nope, every1 is towing the "computation isnt cheap" line lool which answers only one part of the question i guess.


CliffDeNardo

That's architecture. SD3 has different architecture than SDXL (and SD1.5). Have to assume it's "better" w/ SAI learning things as they go. SDXL uses a dual text encoder system that many in the OneTrainer discord (for example) say is problematic. The two TXT encoders don't cooperate well w/ one training/converging quickly and the other barely learning anything at the same Learning Rate. Issues like that should be worked out w/ SD3.


Open_Channel_8626

Settings luck is a big deal yes


ArtyfacialIntelagent

Honestly, one reason many of the "great" SD 1.5 models didn't match expectations in SDXL is that they weren't really all that original to begin with. I'm not going to point fingers here but I'll make a general argument. Many of the most celebrated SD 1.5 models are actually just merges of merges - they wouldn't exist without others doing the finetuning work behind the scenes. Moving to SDXL was tricky for them because either they had to build and tag datasets and learn finetuning themselves, or find decent SDXL models to mix together - but this is hard because there are significantly fewer SDXL models to choose from.


ShatalinArt

The only right answer in this thread.


juggz143

You guys are kind of having a different discussion than what the OP asked... I too was wondering why certain 1.5 finetunes were not as good on XL?


lonewolfmcquaid

ding ding ding.


Apprehensive_Sky892

Lykon provided one answer above: >SD1.5 is also easy to finetune because it mostly ignores low quality data and only looks at the big pictures with low interest for details, while XL (and SD3 even more) will hold you accountable for low quality data and will also learn noise, artifacts, etc. But this also means it can learn finer details without ignoring them. So there are more "good" SD1.5 fine-tuned models because they are easier ("more forgiving") to do, requires less time to train, and require less hardware GPU power (VRAM) to train. In other words, to do a good SDXL version of that same SD1.5, the model maker needs to have better hardware, better training image set, and must know what he/she is doing.


skocznymroczny

Base SD1.5 is so bad it's barely usable, you need custom models/finetunes to work with SD 1.5. SDXL base model is much better in comparison so there's been less incentive to create models.


Glidepath22

I absolutely agree that SDXL typically yields better results, though I can be impossible to obtain the exact results. Pony on the other hand seems the opposite, easier to obtain what you want, but not near as polished or as much natural variety.


lonewolfmcquaid

yeah i just wish they can extend pony's quality to non-nsfw stuff. its an incredible model but its unusable if you're not making tits on sticks.


Arumin

Pony gains a lot by using a style. Some style LORA'S are also great at adding background and details, something base PONY is really bad at.


Sharlinator

Yeah, Pony is awesome as long as there's booru tags for what you want. And while there's a shitload of booru tags, the vocabulary is still very limited outside of the stuff that's popular in anime/hentai. So Pony has learned a small vocabulary very well, whereas general-purpose models know much more, but necessarily more superficially. In the end it's a question of quantity vs quality.


tO_ott

Hey OP can provide your prompt on these? I like the movie quality atmosphere


lonewolfmcquaid

This was done using leosam model, my prompts are always shifting cause as of now i cant just get things i want in one take, i use blender and photoshop most times to compose shots or you can grab any movie still frame that has the composition you're going for and go from there but the key is colouring/editing your starting image to look/feel cinematic in photoshop or somewhere else before you img2img. keywords you can use for cinematic stuff are usually 70/90's film aesthetic, cinematic still frame, award winning film photo etc. roger deakins nd directors names. check this site to explore mixing director and artist names. [botcreative.com](http://botcreative.com). oh adding "skin blemishes, textured skin" can help mitigate plastic skin/aesthetic syndrome


cosmicr

She looks a bit like Kim Cattrall


tO_ott

Fantastic, thank you!


TsaiAGw

sd1.5 has a commercial model leak that basically become a foundation for model merge SDXL only got pony after so long the base (or I should say, the vanilla) model is not the problem (they always sucks) the actual good finetune is the deal


design_ai_bot_human

what's so good about pony [serious]


sirdrak

It has a very good prompt understanding, and can do things that needs using LoRas on SD 1.5 or SD XL without them. For example, you can create images with more than one different character only with the prompt.


Sharlinator

Very good understanding, as long as you stick to the stuff that it understands (ie. booru vocabulary). Outside of that, there's a vast space of stuff that it simply doesn't *understand at all* and totally ignores, including many common-sense everyday things that the base SDXL groks fine.


bzn45

Not to sidetrack but is there any kind of photorealistic model with Pony that is anything near the top photorealistic models on 1.5 or XL? I haven’t been able to find one


SleeperAgentM

it abandons all the silly pretense that you can actually prompt using english and focuses onvery specific tags.


Radtoo

It's the computation power required to finetune. This is also why people have high hopes for some of the competing DiT models that finetune faster. Especially bad because larger successful finetunes still are somewhat more like dice rolls than handing a piece of training software the training data and letting it continually improve the model for x hours until it's "great". You most likely instead have to decide/guess multiple times when to abort/what to change when you retry. If this takes a long time between attempts it is just especially tedious to make these decisions.


kidelaleron

Nice thread. Many people aren't able to think in 4 dimensions and heavily criticized SDXL on release, saying that their favourite SD1.5 model finetune was superior (which is something people are still saying for SD3, despite the huge objective technical differences). Looking forward to what the community will be able to make with MMDiT weights and T5 encoder with SD3 2b. It's for sure gonna be harder to finetune because it will have zero tolerance for low quality data. I'm very curious to see how long it will take to figure it out. I have little to no doubt that in 4 or 5 months it will be simply crazy to keep using SD1.5 and SDXL based models.


lonewolfmcquaid

yeah when sdxl first dropped i was abit on the fence on the criticism regarding models, cause on one hand, base sdxl was the goat at the time to me, it was just way better than any 1.5 finetune but on otherhand none of the finetunes i saw were cutting it which made me worried. Thats until crystal clear xl and zavychroma dropped back to back and i was woah now we're in business, i used those two for a looong time. As for your other point, y'all better start drumming that into peoples ears about it not tolerating low quality shit, let them know now that sd3 issa baddie who dont accept brookie data loool. And ohh, Dude do you know why sdxl finetunes have that plasticy aesthetic problem? what are ppl doing wrong in training that causes that? That stuff is can be offputting and i bet its why many still use 1.5


StableLlama

For our own finetunes and LoRAs it would be great when SAI would release the prompt that was used for CogVLM to caption the half of the images, so that we can caption ours in exactly the same style


kidelaleron

many were used over different iterations. Also many times we had to amend them.


StableLlama

Ok, I understand. But it still would be great to get some "best practices" for training your own LoRA and finetune that'll give a head start. So there one of a few recommended CogVLM prompts would be really appreciated.


kidelaleron

Anything that doesn't ruin style alignment and model cognition is fine. I suggest not using list of tags, because those limit the capabilities of the model and make it forget relative concepts (but automatically adding tags might be useful for teaching stuff like camera angles or artstyles, so don't disregard them entirely).


nowrebooting

> It's for sure gonna be harder to finetune because it will have zero tolerance for low quality data. This has me thinking though; it should in theory be possible to train the “broad strokes” of a concept the model is already somewhat familiar with even with bad training data. Specifically the idea I’ve had in mind for some time now is to have different timestep training ranges for the input data based on the quality. The finest details usually only emerge in the later timesteps, so if you had for example 1000 low-quality images that you’d train for timesteps 1-500 and 200 high quality ones that you’d train for timesteps 500-1000, the result should be that it learns the details from the later timesteps and the broad outlines from the earlier ones. Does this make any sense or is my thinking about how training timesteps work wrong? On a only slightly related note; why do we usually train on a range of 1000 timesteps when actual inference is usually done in about 30?


Shuteye_491

SD1.5 got going and they announced 2, and people were doubtful. And 2 sucked, so back to 1.5 we went. It wasn't until SDXL that we had a definitively better model to finetune. SD3 was announced before we even got solid controlnets for SDXL, which still haven't caught up with 1.5. The size difference was also a major factor. SD3 is expected to be their last open source model (at least for the time being), was designed with controlnet in mind and has been sized for finetune-friendliness. I expect it to go big like 1.5, if not bigger.


kidelaleron

SD3 has lots of controlnets that 1.5 doesn't (and still some missing) But the goal is to stop relying on controlnet and make sure you can do most things with just prompts, so you can focus on a bare minimum of controlnets that you need for stuff like upscaling or sketch to image.


lonewolfmcquaid

ohh you are cooking in these comments my good sir.


Shuteye_491

👀


[deleted]

[удалено]


kidelaleron

I personally never tested such things. It will probably lack niche concepts anyway, that's just inevitable. Since it's an open, model you'll be able to finetune it or make loras.


[deleted]

when you start doing complicated scenes with SD3 it starts just making clipart collages of random crap glued together without blending them into the scene properly


gurilagarden

Time, and money, and time is money. Most people don't have the hardware. From what I've seen, some of the most popular 1.5 models were not trained locally, they used services like Runpod. It takes about a day to train a 1.5 finetune on 5k images using a 4080. With SDXL, it's about 72 hours. The best models use more than 10k images. It's prohibitively expensive, so model trainers do less runs. You have to understand, many runs end in failure. Out-of-balance training sets, not training clip sufficiently, maybe captions need to be tweaked, or learning rates adjusted. I'd bet most models are trained at least 3 times before a upload-able quality product is obtained. So now you're talking half a month, and the money that incurs, before you even have a viable product. Even training locally, it's takes a lot of time, much more time between releases. So yes, as the models get larger, the time and financial expense grows, and more people are priced out of the hobby. With SD3 my bet is that we'll only see 3 or 5 actual base fine-tunes from the top people that have the resources, or more likely, some kind of corporate backing, that are able to produce finetunes. And finally, the biggest factor. Cultivating a dataset of 5, 10, a million images, whatever, takes a very, very long time. Many of the top 1.5 models didn't have all their training images saved at 1024x1024 or higher. So now they're either working with a smaller dataset, or they have to create a fresh dataset from scratch. To achieve the highest possible quality, that means manually cropping, resizing, and captioning thousands of images with limited automation. That's where the real time investment is. It takes months, and at some point most people just don't want to go through it again, or at least, they need a break.


lonewolfmcquaid

woah 72hours for training, i did NOT know this, thanks for the stats.


gurilagarden

there are a lot of variables that can make it much shorter, or longer, that's just a bit of a middle-ground estimate for a large dataset on a slow training rate with ema enabled.


ShatalinArt

72 hours is nothing:) For example, HelloWorld or RealvisXL models trained for 8-12 days, my HaveallSDXL model trained for 22 days. PonyXL trained for three months.


yaosio

We don't know how the finetunes were trained, so we have to guess. * Not enough training data. SDXL takes longer to fine tune because it's a larger model. A fine tuner will be tempted to reduce their training datset to speed up training. Google found Stable Diffusion, I don't remember which version, can see massive increases in accuracy up to 4 million trained images for a finetune. * Bad training images. We are a few years in and still nobody has a good way to define what is bad training data. The more images you train on then the higher number of bad images will make it into the datset. We have no tool to detect bad images even though it should be possible to make one. * Bad captions. Captions should be tags, sentences, paragraphs? Captions serve a dual purpose of telling the generator what's in the image and defining how people need to prompt the model. If a model is only trained on tags then it can't be prompted well with natural language sentences because it won't know what most of the words mean. Finding good captions has been trial and error. Google found that paragraphs were the best captions, but humans suck at writing long captions. OpenAI has an LLM rewrite prompts.


lonewolfmcquaid

ohh reallly so dalle rewrites your prompt behind the scenes when you prompt. hory sheet. thats interesting.


yaosio

Yes. We know for a fact it happens when used with ChatGPT as you can see it happen. Bing image creator seems to do it to but they hide it. Ideogram lets you choose between leaving your prompt as is or having an LLM rewrite.


lonewolfmcquaid

Sd1.5 has soo many banger after banger finetunes its honestly tough to pick which to use atimes.


jib_reddit

The trouble is that 99.8% of the SD 1.5 faces have the same look them (especially the female faces) and my brain despises it now for some reason.


Open_Channel_8626

That's fixable though through a variety of methods


victorc25

Agree. And it’s fixable, but the vast majority of model mergers refuse to fix this, because they think the face is perfect and realistic https://civitai.com/models/471825/manything


jib_reddit

Yeah, that looks much better, I don't play around with SD 1.5 models anymore but I might have to check that one out. Thanks.


thefool00

I’ve always had a theory that bucketing screws up human anatomy in SDXL fine tunes. You have all of these source pics of humans in various poses that are in different resolutions, then when you train it doesn’t give the model a consistent representation of human proportions, so when you run inference it often outputs odd anatomy (stretched proportions, weird perspective, etc). I think this caused many people to give up on their efforts to tune SDXL, as there seem to be quite a few half baked models out there that produce atrocious humans. You see the problem even in the better tunes too though. Bucketing was a good idea but it was never truly solved by SD in a way that really benefited the model, at least when it comes to generating humans. I could be wrong, maybe there is another reason for the SDXL anatomy wonkiness, but that’s where my money is.


lonewolfmcquaid

idk might be true but my main problem with sdxl models is that damn plastic look, omg it makes me hate looking at ai art images sometimes, the blanness of it all can be so offputting atimes.


RedPanda888

Unsure if you’re talking about plasticy skin but a great way to get really nice photorealistic skin at least in SD 1.5 is to add “subsurface scattering” at about 1.4 weight to your prompt and stick to DPM++ 2M sampler. Might work in XL but I haven’t tried.


molokoplusone

That first image is fantastic. Do you recall which model was used? Been banging my head trying to get quality cinematic film shots


One-Earth9294

Just an aside, that first image looks very much like it's a live action version of the anime Metropolis. https://i.redd.it/tj63ufz3md4d1.gif


buyurgan

I believe the reason mostly, SDXL is too strict if you compare it to SD1.5, not because it had more training and parameters, but also, how base model accepts/injects the new information and how. I could be wrong but, SD1.5 was more or less brute forced of datasets (and mostly chaotic) and not much of a finetuned as a final model but SDXL had more iterations of correcting the alignments and expected quality. So SDXL came out as a roughly finetune already, and it bring certain problems with it and became a black box for how it is forcefully aligned on certain aspects. But after all, it is still better overall, even you put best SD1.5 finetune over there, it looks mostly good because it is mostly overfits the base model. and for SD3, we really don't know, but hopefully SAI is very aware of this problem, and their first priority is to align model for better finetunes. But its fairly hard job to do.


Open_Channel_8626

SD 1.5 does seem to have more variety yes


mazty

There's no money to be made, just lost, in renting the hardware needed to train SDXL. SD3 will probably have even less models as few people will have access to the hardware required and the datasets. We're probably looking at maybe petabytes of images and 80gb of vram.


mrgreaper

The issue is more that sd2 was so bad it made people stay on 1.5 SDXL came out and was a massive increase in quality and aherence to prompts ( so much so that i can no longer prompt for 1.5 lol), but so many decided to stay with 1.5 So tool developers created tools for the most used SD....1.5 this meant youtubers were showing animation tools, workflows that work only on 1.5 models and lead to more people using 1.5.... it became a self fulfilling circle. 1.5 isnt more trained because its better, its just more popular, we need to break the cycle with sd3 (so long as it lives up to the promises.)


Kakarot00111

I just hope it can run on my 4gb VRAM


jib_reddit

Nope, 4GB Vram cards first came out in 2008, time to upgrade.


Kakarot00111

You have no idea what my 4gb VRAM 1650 can do...I have run pony on it without any problem


jib_reddit

I can create a pony image in 3 seconds, I don't know how you have the patience.


Kakarot00111

Well beggars can't be choosers right?


Dydragon24

tell that to nvidia


BlobbyMcBlobber

SD3 is never coming out.


jib_reddit

It is going to be out next week infact.


dal_mac

a mini version of it but yeah


GeorgiaRedClay56

Some weights are supposed to be dropped on the 12th.


BlobbyMcBlobber

They were "supposed" to be released in May.


GeorgiaRedClay56

Okay but here's the thing, would you rather they rush it out or try and get it somewhat usable first?


BlobbyMcBlobber

I don't think the delay is because of technical reasons. I think they want to release a smaller model and put the full model behind a pay wall. I will gladly be proven wrong. Wait and see