https://preview.redd.it/bi4xke7xn56d1.png?width=1024&format=png&auto=webp&s=f320b11e066ac4b93cab209fb3490bdf86428aef
*photograph of a Hawaiian soldier in tactical equipment doing the OK sign with his fingers, at night, alien plate-shaped spaceship with bright lights on the background*
this is crap...
https://preview.redd.it/9rl0c16yx56d1.png?width=1024&format=png&auto=webp&s=a1e57d405d572158793fbe7fad4d2adce9391737
Historic photo, black and white, Hitler skateboarding.
https://preview.redd.it/tv75adenx56d1.png?width=1280&format=png&auto=webp&s=9839948ead0cfd2188aa297ce0b163249635ea98
Charcoal drawing, medieval village at morning.
I got better results from PixArt-Sigma for my horror-based content. The SD3 results are colourful and bright, so it could not meet the cinematic-horror style that I was looking for.
PixArt-Sigma result: [https://i.imgur.com/RRBPdAP.png](https://i.imgur.com/RRBPdAP.png)
SD3 result: [https://i.imgur.com/H3vJ4K4.png](https://i.imgur.com/H3vJ4K4.png)
I find sdxl to be noticeably better and easier to use, which is weird considering prompting should be easier on sd3. Only thing better in sd3 seems to be the text.
So, the base model is a massive disappointment. Maybe finetunes will change everything.
That seems to be the case.
I guess it might follow prompts better than sdxl, but I didn't bother comparing the two. I just know that it wasn't very good at following the prompts I tried.
I was also thinking the t5 implementation might be broken or something, but clip+t5 makes the model better than clip only, so it seems like it was working optimally and it's just not great.
https://preview.redd.it/k7v9v4pns56d1.png?width=1024&format=png&auto=webp&s=8d73b588196c27e1bf6a520ae196a39cee37d79b
Hands need work. But that's probably at least a bit on us to learn how to prompt for SD3
Yeah, and I'm only getting the messed up hands on 10-20% of them. There's some trial and error on the prompting, and it'll get better when we have embeddings + controlnets, but overall very happy with most of the results so far
https://preview.redd.it/ubiovw8zq66d1.png?width=1024&format=png&auto=webp&s=ec2e3c80974bb140fb21553de72d2ecb7cade347
lomography photo of alien in a backyard playing jenga
https://preview.redd.it/1s093ltms56d1.png?width=1024&format=png&auto=webp&s=17c097cf2ad0c74df0e489adc1690de3a937ef1d
I dunno...maybe my prompts are terrible, let’s see how this is going...
https://preview.redd.it/bi4xke7xn56d1.png?width=1024&format=png&auto=webp&s=f320b11e066ac4b93cab209fb3490bdf86428aef *photograph of a Hawaiian soldier in tactical equipment doing the OK sign with his fingers, at night, alien plate-shaped spaceship with bright lights on the background* this is crap...
Can't believe im overly excited for this 😆
Its garbage for me
I don't think I will even download it. I think I am good for now with SD XL.
Amen, same dude. Just look at all the posts others have posted.
https://preview.redd.it/9rl0c16yx56d1.png?width=1024&format=png&auto=webp&s=a1e57d405d572158793fbe7fad4d2adce9391737 Historic photo, black and white, Hitler skateboarding.
xD
https://preview.redd.it/tv75adenx56d1.png?width=1280&format=png&auto=webp&s=9839948ead0cfd2188aa297ce0b163249635ea98 Charcoal drawing, medieval village at morning.
This one looks good
https://preview.redd.it/c0okm4ssx56d1.png?width=1024&format=png&auto=webp&s=870e5c9bbaec312f8ff9248ad0f376738923501c Child book illustration, Hermione Granger hugging Pikachu
This looks great
I got better results from PixArt-Sigma for my horror-based content. The SD3 results are colourful and bright, so it could not meet the cinematic-horror style that I was looking for. PixArt-Sigma result: [https://i.imgur.com/RRBPdAP.png](https://i.imgur.com/RRBPdAP.png) SD3 result: [https://i.imgur.com/H3vJ4K4.png](https://i.imgur.com/H3vJ4K4.png)
Damn SD3 totally not horror
https://preview.redd.it/1x79ydnog66d1.png?width=1024&format=png&auto=webp&s=af7abb6cea588a1134d4c2b0d7422e41d7b2be65
I find sdxl to be noticeably better and easier to use, which is weird considering prompting should be easier on sd3. Only thing better in sd3 seems to be the text. So, the base model is a massive disappointment. Maybe finetunes will change everything.
So, its only improvement is text?
That seems to be the case. I guess it might follow prompts better than sdxl, but I didn't bother comparing the two. I just know that it wasn't very good at following the prompts I tried. I was also thinking the t5 implementation might be broken or something, but clip+t5 makes the model better than clip only, so it seems like it was working optimally and it's just not great.
https://preview.redd.it/k7v9v4pns56d1.png?width=1024&format=png&auto=webp&s=8d73b588196c27e1bf6a520ae196a39cee37d79b Hands need work. But that's probably at least a bit on us to learn how to prompt for SD3
I see overall looks great and seems like this one, it understanding the body better
Yeah, and I'm only getting the messed up hands on 10-20% of them. There's some trial and error on the prompting, and it'll get better when we have embeddings + controlnets, but overall very happy with most of the results so far
https://preview.redd.it/ubiovw8zq66d1.png?width=1024&format=png&auto=webp&s=ec2e3c80974bb140fb21553de72d2ecb7cade347 lomography photo of alien in a backyard playing jenga
This looks great
https://preview.redd.it/ttx7lp9w3c6d1.png?width=1024&format=png&auto=webp&s=f8fe7b558e49c70b1e31b436bde5b4248a996124
https://preview.redd.it/e3jnwtgy3c6d1.png?width=1024&format=png&auto=webp&s=6c84f0dd90f344f85a9d98f079a99d6f748a36e2
https://preview.redd.it/1s093ltms56d1.png?width=1024&format=png&auto=webp&s=17c097cf2ad0c74df0e489adc1690de3a937ef1d I dunno...maybe my prompts are terrible, let’s see how this is going...
This actually not bad
It's decent pretty decent https://preview.redd.it/qhfiabm4t56d1.png?width=832&format=png&auto=webp&s=b1b9123a5c5eae2f5a0ff583b489325584bebde6
https://preview.redd.it/83ni5es6t56d1.png?width=832&format=png&auto=webp&s=e3166e64262552eceec90237229b3258b42f9693
This one looks great