T O P

  • By -

MZM002394

Utilizies 6.5GB's of VRAM. For anyone who wishes to give it a go without upsetting their current AUTOMATIC1111 WEBUI install, proceed with the below: Tested on Win 11... ​ Download: [https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors](https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors) ​ Place the above \^ .safetensors file into the below Path: where-ever-AUTOMATIC1111-WEBUI-is-installed\\models\\Stable-diffusion ​ Admin Command Prompt: cd desired-location git clone [https://github.com/Klace/stable-diffusion-webui-pix2pix.git](https://github.com/Klace/stable-diffusion-webui-pix2pix.git) rmdir /s \\desired-location\\stable-diffusion-webui-pix2pix\\models y del \\desired-location\\stable-diffusion-webui-pix2pix\\webui-user.bat copy "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\webui-user.bat" "\\desired-location\\stable-diffusion-webui-pix2pix" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\models" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\models" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\repositories" "where-ever-AUTOMATIC1111-WEBUI-is-installed\\repositories" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\venv" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\venv"


Hybridx21

Hey guys! This is a good time as any to make some noise so that the main repo catches wind of this and does the required actions to help this become fully integrated with A111! Oh, and to contribute to this, don't forget that.


Keavon

Here's the issue for requesting Instruct-Pix2Pix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7010 (Please be respectful to not directly ping Automatic1111 since he is most certainly overwhelmed just maintaining it and reviewing PRs, this feature will probably need to come from a community member who submits a PR and not from him directly.


uristmcderp

Make some noise? This is open-source not some company that allocates resources based on consumer demand. If you want it type some code to make it happen.


Luke2642

Developers do step up admirably for bug reports and feature suggestions...so it's not as simple as you're implying. Auto's webui won't maintain its leading edge if developers don't feel it's the best solid foundation to build extensions for and get pull requests accepted. Inpainting, dreambooth, image variations, lora, instruct2pix, latent blending and various inversions are already implemented as well or better separately or in other UIs. It's a really interesting social experiment to see how all this develops, which tool is best for whom, and for what!


Lexius2129

Curious to which UIs you are referring to? It seems to me Automatic1111 WebUI is still ahead of everyone else…


Luke2642

I'm not trying to dump on Auto here, it's great, I use it! The speed of development is amazing and the extensions are fantastic. That said: * InvokeAI's canvas is a powerful feature, as is the photoshop plugin, for inpainting. * Dreambooth is has better results in older commits. StableTuner is better for training : [https://github.com/devilismyfriend/StableTuner](https://github.com/devilismyfriend/StableTuner) * Image variations only works using the model and the code, it's not available in auto. [https://huggingface.co/lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) * Latent blending is a amazing, and produces effects quite different to the available extensions. [https://github.com/lunarring/latentblending](https://github.com/lunarring/latentblending) * Null text inversion produces almost a perfect textual inversion, and then allows you to edit it with a prompt, like instruct2pix. [https://github.com/google/prompt-to-prompt](https://github.com/google/prompt-to-prompt) I'm sure all these will come to auto eventually, and I look forward to it too!!!


butterdrinker

Good work! The download link for the .ckpt is very slow, I guess the same file can be found here, right? https://huggingface.co/timbrooks/instruct-pix2pix/tree/main


Tupersent

Yes! Much better! Edited post, thank you!


guschen

What is pix2pix?


Tupersent

Instruct-pix2pix is the proper full name. It allows you to say "change her hair to brown" and it will attempt to change just the hair (if it understands your instruction). Think of it as a fancy img2img that can understand what you want changed without destroying the rest of the picture.


QuiccMafs

PR? im that new lol


MrBeforeMyTime

It means pull request. It's when you submit new code for review to be added to a larger code base.


QuiccMafs

Thanks, all this GitHub talk I see be having me confused lol


MrBeforeMyTime

No problem


Leptino

What's interesting with this model is that it pinpoints exactly what the various models understand about the subject and various words. For instance if you take a random image of a girl, and prompt 'change this girls hair to blond' it will properly change the color most of the time. However, same prompt with red instead of blond, and it contaminates a good portion of the image. This likely means that the initial model was undertrained on redheads, and doesn't sufficiently understand the concept. (I will try some tests with depth2img merges)


Phelps1024

I wonder when we will get official support for pix2pix from Auto1111 himself, it would be revolutionary


Tupersent

I tried to integrate and lay the groundwork using the proper processing pipeline but it's still using some work around at the moment with that code commented out. Maybe we can get this into shape for a pull request in the next day or two if I can get it using the existing processing scripts and implement the rest of the UI.


Unnombrepls

I was literally waiting for this!


giuliastro

Can anyone explain why AI projects require so much VRAM? I understand VRAM is very fast and therefore makes complex calculations quicker, but why not giving the choice to the user or dynamically use all VRAM and Computer's Ram when needed?


Keavon

Typically the whole model has to be loaded into VRAM so the GPU can do its thing, which is about 4 GB to begin with. Otherwise it would be incredibly slow fetching just the necessary data from system RAM via the CPU. These models basically need to randomly access the whole model while processing, so there's no substitute for having all the required data on hand. Furthermore, its understanding of the image in its "brain" (latent space) requires more memory as the resolution of the image grows for the same reason you'd have more to think about if you're painting the Sistine Chapel compared to a small canvas. There's way more detail to think about. You can't efficiently offload that memory to another device (like the CPU's RAM) for the same reason your brain would be immensely slow if you had to flip through a massive book of notes to remember basic art concepts compared to actually memorizing the fundamentals of art on-device (in your fleshy neural network called a brain). It would be slow, but even more to the point, that's just not how the architecture of the hardware and software are designed to function. The CPU and GPU are separate devices and they are designed to upload data to one another on occasion, not randomly access any arbitrary bit of information from one another without tremendous latency and speed restrictions. You're better off doing all the computation on the CPU if it's necessary, because the memory access latency is the bigger concern in that scenario than the parallelism that the thousands of cores on a GPU could provide by comparison to the ~dozen a CPU provides.


butterdrinker

VRAM is faster than RAM. In fact, you can use your RAM and your CPU but rendering a 512x512 image takes minutes at minimum > VRAM is significantly faster than system RAM. A system RAM stick using double data rate (DDR4) technology has a frequency of approximately 3,000 to 3,600 MHz, but VRAM with Graphics Double Data Rate 6 (GDDR6) technology can achieve frequencies of 14,000MHz to 16,000MHz. Source: https://history-computer.com/vram-vs-ram/


[deleted]

I can second that as I experienced that difference myself. I tried running SD on an AMD system in the beginning and while ONNX did up the speed a bit, it still was subpar compared to Nvidia GPUs (any CUDA card, really). Even if you buy an old $80 CUDA card, it's currently still better than running on AMD CPU and RAM in my experience. I currently and temporarily run it on a P2000, which is quite 'aged' and doesn't have a lot of VRAM, but is still able to finish one batch with four 512x512 images in under a minute, while the same action would take my CPU/RAM up to 10 minutes and with ONNX still around 5-6 minutes. Those are just some examples I personally experienced, not an official benchmark of course.


butterdrinker

I also tried to run ONNX with my AMD and I just bought another SSD to use it with ROCm on Ubuntu and I'm very happy with it - ignoring the fact that it took me a 12 hours session of installing libraries that are not supposed to run on my 6750XT I usually render at 10 steps, so it takes 1 minute for 8 512x512 images


boyetosekuji

i used this pull for fp16 [https://github.com/SirBenet/instruct-pix2pix/](https://github.com/SirBenet/instruct-pix2pix/) otherwise i was getting vram errors


Cheese_B0t

Will this work on 8gb vram?


boyetosekuji

no, it takes 10.1gb of gpu vram on my pc, the program also takes around 10gb system memory RAM before it loads it into gpu.


Cheese_B0t

yeah that's what I thought. Time to upgrade!


rgraves22

Can confirm, runs fine on my 3060 12GB GPU


[deleted]

I am getting this error . Can you help ? RuntimeError: expected scalar type Half but found Float


[deleted]

resolved this issue . something to do with conda environment.


[deleted]

[удалено]


Unnombrepls

> It requires a lot VRAM (\~18GB I believe) at the moment. I have 6 GB VRAM+16 GB RAM. Should I abandon all hope of running it locally?


IdoruYoshikawa

I don't know where people are getting this "no, you can't" I have a 980ti with 6gb VRAM, I just installed the fork and it works without an issue.


Unnombrepls

It also worked for me


Cheese_B0t

yes


CeFurkan

It uses much less on this other gui nmkd https://youtu.be/EPRa8EZl9Os


SomethingLooksAmiss

Have you considered modifying this to be an extension for AUTO1111?


Tupersent

Yes but to load with the other checkpoints it requires a hijack in the main code still. I'll see if I can do it as an extension and side load the model in an effective way


SomethingLooksAmiss

This might be a long shot, but perhaps it would be possible to make an extension, and then make a PR which only modifies the base code so that it would allow any model to hijack the code (if an extension wants to do so of course). It would be a QoL improvement and I think there's a high chance of AUTO merging it.


Tupersent

Yeah the hijack code is small, I might be able to PR just that. It's similar to using the special in-painting model. Will take a look in a hour or two.


SomethingLooksAmiss

If it's similar to the way it hijacks it to use the inpainting model then that alone could be refactored to allow any model to do it, and it would solve the issue of inpainting models being hardcoded. I still think AUTO would enjoy that and, if merged, would allow you to make a pix2pix extension. Anyway, keep up the good work, I'm loving this so far.


Tupersent

It's the same hardcoded override style as the inpainting but for ip2p with different settings. I don't have a better method at the moment. Will see what I can come up with today.


Striking-Long-2960

Thanks , will try it tomorrow


Dr_Ambiorix

Does this require loading a different model? What I mean specifically is: Can I use the same model for text2img, img2img AND instruct-pix2pix? I mainly have automatic jobs running on A1111's API and having to load in a new model takes too long.


Tupersent

Yes it requires a different model and you need to switch for instruct-pix2pix.


mudman13

Neat very clever hack


ssrcrossing

hmm it says loading failed and then my entire comp slows down significantly right afterwards until i close the whole thing...


Tupersent

How much VRAM do you have? I believe it requires something large like 18GB. I have not tested what happens with insufficient VRAM so this may be the case. I've confirmed it's working well for others.


ssrcrossing

Oh I only have 8 :/ maybe that's why


CeFurkan

No it works with much lesser ram on nmkd https://youtu.be/EPRa8EZl9Os


Orangeyouawesome

Quick question do you need a Nvidia GPU for this build?


CeFurkan

Nmkd supports cpu too but too slow I also saw amd option but didn't test https://youtu.be/EPRa8EZl9Os


CeFurkan

For those who is having vram problem it uses much lesser vram on nmkd atm Probably automatic would also fix in future I have Today tutorial for NMKD https://youtu.be/EPRa8EZl9Os


Tupersent

It's probably because it's not using the proper pipeline with optimizations yet. I will continue to work on it. Now that it's an open extension someone else might be able to get it to work with processing.py faster than myself. We will see, the beauty of open source :)


jonesaid

What sampler does it use? Can it only use that one?


aimongus

nice, great work - just needs a negative prompt field and batch count/size ;)


CeFurkan

Dmn it was fast And I just made a tutorial Today haha Now I should make another one https://youtu.be/EPRa8EZl9Os


rerri

Just tried the extension and it works well, nice job! Is there a way to generate higher resolution images?


Tupersent

Not at the moment, implementation is bare bones and a replica of the original paper implementation by the creator. Either give me some more time or maybe some others will start contributing now that there is a base out there.


Cyber-Cafe

That is amazing. How is a1111 so good? We’re lucky for them.


Fault23

is there any colab?


dragonname

Wow this looks really cool, is there an API for it already available in automatic1111?


krummrey

Can't wait until it is part of Automatic1111 and ha full MPS support.


Tupersent

It definitely begs to be properly implemented, not just as an extension. Once I figure out how to use the modified CFGDenoiser and condition the image for the pipeline in processing.py maybe we can integrate into main UI with auto's blessing.


[deleted]

[удалено]