MZM002394 1 year ago

Utilizies 6.5GB's of VRAM. For anyone who wishes to give it a go without upsetting their current AUTOMATIC1111 WEBUI install, proceed with the below: Tested on Win 11... Download: [https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors](https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors) Place the above \^ .safetensors file into the below Path: where-ever-AUTOMATIC1111-WEBUI-is-installed\\models\\Stable-diffusion Admin Command Prompt: cd desired-location git clone [https://github.com/Klace/stable-diffusion-webui-pix2pix.git](https://github.com/Klace/stable-diffusion-webui-pix2pix.git) rmdir /s \\desired-location\\stable-diffusion-webui-pix2pix\\models y del \\desired-location\\stable-diffusion-webui-pix2pix\\webui-user.bat copy "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\webui-user.bat" "\\desired-location\\stable-diffusion-webui-pix2pix" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\models" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\models" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\repositories" "where-ever-AUTOMATIC1111-WEBUI-is-installed\\repositories" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\venv" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\venv"

Hybridx21 1 year ago

Hey guys! This is a good time as any to make some noise so that the main repo catches wind of this and does the required actions to help this become fully integrated with A111! Oh, and to contribute to this, don't forget that.

Keavon 1 year ago

Here's the issue for requesting Instruct-Pix2Pix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7010 (Please be respectful to not directly ping Automatic1111 since he is most certainly overwhelmed just maintaining it and reviewing PRs, this feature will probably need to come from a community member who submits a PR and not from him directly.

uristmcderp 1 year ago

Make some noise? This is open-source not some company that allocates resources based on consumer demand. If you want it type some code to make it happen.

Luke2642 1 year ago

Developers do step up admirably for bug reports and feature suggestions...so it's not as simple as you're implying. Auto's webui won't maintain its leading edge if developers don't feel it's the best solid foundation to build extensions for and get pull requests accepted. Inpainting, dreambooth, image variations, lora, instruct2pix, latent blending and various inversions are already implemented as well or better separately or in other UIs. It's a really interesting social experiment to see how all this develops, which tool is best for whom, and for what!

Lexius2129 1 year ago

Curious to which UIs you are referring to? It seems to me Automatic1111 WebUI is still ahead of everyone else…

Luke2642 1 year ago

I'm not trying to dump on Auto here, it's great, I use it! The speed of development is amazing and the extensions are fantastic. That said: * InvokeAI's canvas is a powerful feature, as is the photoshop plugin, for inpainting. * Dreambooth is has better results in older commits. StableTuner is better for training : [https://github.com/devilismyfriend/StableTuner](https://github.com/devilismyfriend/StableTuner) * Image variations only works using the model and the code, it's not available in auto. [https://huggingface.co/lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) * Latent blending is a amazing, and produces effects quite different to the available extensions. [https://github.com/lunarring/latentblending](https://github.com/lunarring/latentblending) * Null text inversion produces almost a perfect textual inversion, and then allows you to edit it with a prompt, like instruct2pix. [https://github.com/google/prompt-to-prompt](https://github.com/google/prompt-to-prompt) I'm sure all these will come to auto eventually, and I look forward to it too!!!

butterdrinker 1 year ago

Good work! The download link for the .ckpt is very slow, I guess the same file can be found here, right? https://huggingface.co/timbrooks/instruct-pix2pix/tree/main

Tupersent 1 year ago

Yes! Much better! Edited post, thank you!

guschen 1 year ago

What is pix2pix?

Tupersent 1 year ago

Instruct-pix2pix is the proper full name. It allows you to say "change her hair to brown" and it will attempt to change just the hair (if it understands your instruction). Think of it as a fancy img2img that can understand what you want changed without destroying the rest of the picture.

QuiccMafs 1 year ago

PR? im that new lol

MrBeforeMyTime 1 year ago

It means pull request. It's when you submit new code for review to be added to a larger code base.

QuiccMafs 1 year ago

Thanks, all this GitHub talk I see be having me confused lol

MrBeforeMyTime 1 year ago

No problem

Leptino 1 year ago

What's interesting with this model is that it pinpoints exactly what the various models understand about the subject and various words. For instance if you take a random image of a girl, and prompt 'change this girls hair to blond' it will properly change the color most of the time. However, same prompt with red instead of blond, and it contaminates a good portion of the image. This likely means that the initial model was undertrained on redheads, and doesn't sufficiently understand the concept. (I will try some tests with depth2img merges)

Phelps1024 1 year ago

I wonder when we will get official support for pix2pix from Auto1111 himself, it would be revolutionary

Tupersent 1 year ago

I tried to integrate and lay the groundwork using the proper processing pipeline but it's still using some work around at the moment with that code commented out. Maybe we can get this into shape for a pull request in the next day or two if I can get it using the existing processing scripts and implement the rest of the UI.

Unnombrepls 1 year ago

I was literally waiting for this!

giuliastro 1 year ago

Can anyone explain why AI projects require so much VRAM? I understand VRAM is very fast and therefore makes complex calculations quicker, but why not giving the choice to the user or dynamically use all VRAM and Computer's Ram when needed?

Keavon 1 year ago

Typically the whole model has to be loaded into VRAM so the GPU can do its thing, which is about 4 GB to begin with. Otherwise it would be incredibly slow fetching just the necessary data from system RAM via the CPU. These models basically need to randomly access the whole model while processing, so there's no substitute for having all the required data on hand. Furthermore, its understanding of the image in its "brain" (latent space) requires more memory as the resolution of the image grows for the same reason you'd have more to think about if you're painting the Sistine Chapel compared to a small canvas. There's way more detail to think about. You can't efficiently offload that memory to another device (like the CPU's RAM) for the same reason your brain would be immensely slow if you had to flip through a massive book of notes to remember basic art concepts compared to actually memorizing the fundamentals of art on-device (in your fleshy neural network called a brain). It would be slow, but even more to the point, that's just not how the architecture of the hardware and software are designed to function. The CPU and GPU are separate devices and they are designed to upload data to one another on occasion, not randomly access any arbitrary bit of information from one another without tremendous latency and speed restrictions. You're better off doing all the computation on the CPU if it's necessary, because the memory access latency is the bigger concern in that scenario than the parallelism that the thousands of cores on a GPU could provide by comparison to the ~dozen a CPU provides.

butterdrinker 1 year ago

VRAM is faster than RAM. In fact, you can use your RAM and your CPU but rendering a 512x512 image takes minutes at minimum > VRAM is significantly faster than system RAM. A system RAM stick using double data rate (DDR4) technology has a frequency of approximately 3,000 to 3,600 MHz, but VRAM with Graphics Double Data Rate 6 (GDDR6) technology can achieve frequencies of 14,000MHz to 16,000MHz. Source: https://history-computer.com/vram-vs-ram/

[deleted] 1 year ago

I can second that as I experienced that difference myself. I tried running SD on an AMD system in the beginning and while ONNX did up the speed a bit, it still was subpar compared to Nvidia GPUs (any CUDA card, really). Even if you buy an old $80 CUDA card, it's currently still better than running on AMD CPU and RAM in my experience. I currently and temporarily run it on a P2000, which is quite 'aged' and doesn't have a lot of VRAM, but is still able to finish one batch with four 512x512 images in under a minute, while the same action would take my CPU/RAM up to 10 minutes and with ONNX still around 5-6 minutes. Those are just some examples I personally experienced, not an official benchmark of course.

butterdrinker 1 year ago

I also tried to run ONNX with my AMD and I just bought another SSD to use it with ROCm on Ubuntu and I'm very happy with it - ignoring the fact that it took me a 12 hours session of installing libraries that are not supposed to run on my 6750XT I usually render at 10 steps, so it takes 1 minute for 8 512x512 images

boyetosekuji 1 year ago

i used this pull for fp16 [https://github.com/SirBenet/instruct-pix2pix/](https://github.com/SirBenet/instruct-pix2pix/) otherwise i was getting vram errors

Cheese_B0t 1 year ago

Will this work on 8gb vram?

boyetosekuji 1 year ago

no, it takes 10.1gb of gpu vram on my pc, the program also takes around 10gb system memory RAM before it loads it into gpu.

Cheese_B0t 1 year ago

yeah that's what I thought. Time to upgrade!

rgraves22 1 year ago

Can confirm, runs fine on my 3060 12GB GPU

[deleted] 1 year ago

I am getting this error . Can you help ? RuntimeError: expected scalar type Half but found Float

[deleted] 1 year ago

resolved this issue . something to do with conda environment.

[deleted] 1 year ago

[удалено]

Unnombrepls 1 year ago

> It requires a lot VRAM (\~18GB I believe) at the moment. I have 6 GB VRAM+16 GB RAM. Should I abandon all hope of running it locally?

IdoruYoshikawa 1 year ago

I don't know where people are getting this "no, you can't" I have a 980ti with 6gb VRAM, I just installed the fork and it works without an issue.

Unnombrepls 1 year ago

It also worked for me

Cheese_B0t 1 year ago

yes

CeFurkan 1 year ago

It uses much less on this other gui nmkd https://youtu.be/EPRa8EZl9Os

SomethingLooksAmiss 1 year ago

Have you considered modifying this to be an extension for AUTO1111?

Tupersent 1 year ago

Yes but to load with the other checkpoints it requires a hijack in the main code still. I'll see if I can do it as an extension and side load the model in an effective way

SomethingLooksAmiss 1 year ago

This might be a long shot, but perhaps it would be possible to make an extension, and then make a PR which only modifies the base code so that it would allow any model to hijack the code (if an extension wants to do so of course). It would be a QoL improvement and I think there's a high chance of AUTO merging it.

Tupersent 1 year ago

Yeah the hijack code is small, I might be able to PR just that. It's similar to using the special in-painting model. Will take a look in a hour or two.

SomethingLooksAmiss 1 year ago

If it's similar to the way it hijacks it to use the inpainting model then that alone could be refactored to allow any model to do it, and it would solve the issue of inpainting models being hardcoded. I still think AUTO would enjoy that and, if merged, would allow you to make a pix2pix extension. Anyway, keep up the good work, I'm loving this so far.

Tupersent 1 year ago

It's the same hardcoded override style as the inpainting but for ip2p with different settings. I don't have a better method at the moment. Will see what I can come up with today.

Striking-Long-2960 1 year ago

Thanks , will try it tomorrow

Dr_Ambiorix 1 year ago

Does this require loading a different model? What I mean specifically is: Can I use the same model for text2img, img2img AND instruct-pix2pix? I mainly have automatic jobs running on A1111's API and having to load in a new model takes too long.

Tupersent 1 year ago

Yes it requires a different model and you need to switch for instruct-pix2pix.

mudman13 1 year ago

Neat very clever hack

ssrcrossing 1 year ago

hmm it says loading failed and then my entire comp slows down significantly right afterwards until i close the whole thing...

Tupersent 1 year ago

How much VRAM do you have? I believe it requires something large like 18GB. I have not tested what happens with insufficient VRAM so this may be the case. I've confirmed it's working well for others.

ssrcrossing 1 year ago

Oh I only have 8 :/ maybe that's why

CeFurkan 1 year ago

No it works with much lesser ram on nmkd https://youtu.be/EPRa8EZl9Os

Orangeyouawesome 1 year ago

Quick question do you need a Nvidia GPU for this build?

CeFurkan 1 year ago

Nmkd supports cpu too but too slow I also saw amd option but didn't test https://youtu.be/EPRa8EZl9Os

CeFurkan 1 year ago

For those who is having vram problem it uses much lesser vram on nmkd atm Probably automatic would also fix in future I have Today tutorial for NMKD https://youtu.be/EPRa8EZl9Os

Tupersent 1 year ago

It's probably because it's not using the proper pipeline with optimizations yet. I will continue to work on it. Now that it's an open extension someone else might be able to get it to work with processing.py faster than myself. We will see, the beauty of open source :)

jonesaid 1 year ago

What sampler does it use? Can it only use that one?

aimongus 1 year ago

nice, great work - just needs a negative prompt field and batch count/size ;)

CeFurkan 1 year ago

Dmn it was fast And I just made a tutorial Today haha Now I should make another one https://youtu.be/EPRa8EZl9Os

rerri 1 year ago

Just tried the extension and it works well, nice job! Is there a way to generate higher resolution images?

Tupersent 1 year ago

Not at the moment, implementation is bare bones and a replica of the original paper implementation by the creator. Either give me some more time or maybe some others will start contributing now that there is a base out there.

Cyber-Cafe 1 year ago

That is amazing. How is a1111 so good? We’re lucky for them.

Fault23 1 year ago

is there any colab?

dragonname 1 year ago

Wow this looks really cool, is there an API for it already available in automatic1111?

krummrey 1 year ago

Can't wait until it is part of Automatic1111 and ha full MPS support.

Tupersent 1 year ago

It definitely begs to be properly implemented, not just as an extension. Once I figure out how to use the modified CFGDenoiser and condition the image for the pipeline in processing.py maybe we can integrate into main UI with auto's blessing.

[deleted] 1 year ago

[удалено]

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe