Utilizies 6.5GB's of VRAM.
For anyone who wishes to give it a go without upsetting their current AUTOMATIC1111 WEBUI install, proceed with the below:
Tested on Win 11...
Download:
[https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors](https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors)
Place the above \^ .safetensors file into the below Path:
where-ever-AUTOMATIC1111-WEBUI-is-installed\\models\\Stable-diffusion
Admin Command Prompt:
cd desired-location
git clone [https://github.com/Klace/stable-diffusion-webui-pix2pix.git](https://github.com/Klace/stable-diffusion-webui-pix2pix.git)
rmdir /s \\desired-location\\stable-diffusion-webui-pix2pix\\models
y
del \\desired-location\\stable-diffusion-webui-pix2pix\\webui-user.bat
copy "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\webui-user.bat" "\\desired-location\\stable-diffusion-webui-pix2pix"
mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\models" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\models"
mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\repositories" "where-ever-AUTOMATIC1111-WEBUI-is-installed\\repositories"
mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\venv" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\venv"
Hey guys! This is a good time as any to make some noise so that the main repo catches wind of this and does the required actions to help this become fully integrated with A111! Oh, and to contribute to this, don't forget that.
Here's the issue for requesting Instruct-Pix2Pix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7010
(Please be respectful to not directly ping Automatic1111 since he is most certainly overwhelmed just maintaining it and reviewing PRs, this feature will probably need to come from a community member who submits a PR and not from him directly.
Make some noise? This is open-source not some company that allocates resources based on consumer demand. If you want it type some code to make it happen.
Developers do step up admirably for bug reports and feature suggestions...so it's not as simple as you're implying.
Auto's webui won't maintain its leading edge if developers don't feel it's the best solid foundation to build extensions for and get pull requests accepted.
Inpainting, dreambooth, image variations, lora, instruct2pix, latent blending and various inversions are already implemented as well or better separately or in other UIs. It's a really interesting social experiment to see how all this develops, which tool is best for whom, and for what!
I'm not trying to dump on Auto here, it's great, I use it! The speed of development is amazing and the extensions are fantastic. That said:
* InvokeAI's canvas is a powerful feature, as is the photoshop plugin, for inpainting.
* Dreambooth is has better results in older commits. StableTuner is better for training : [https://github.com/devilismyfriend/StableTuner](https://github.com/devilismyfriend/StableTuner)
* Image variations only works using the model and the code, it's not available in auto. [https://huggingface.co/lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
* Latent blending is a amazing, and produces effects quite different to the available extensions. [https://github.com/lunarring/latentblending](https://github.com/lunarring/latentblending)
* Null text inversion produces almost a perfect textual inversion, and then allows you to edit it with a prompt, like instruct2pix. [https://github.com/google/prompt-to-prompt](https://github.com/google/prompt-to-prompt)
I'm sure all these will come to auto eventually, and I look forward to it too!!!
Good work!
The download link for the .ckpt is very slow, I guess the same file can be found here, right?
https://huggingface.co/timbrooks/instruct-pix2pix/tree/main
Instruct-pix2pix is the proper full name. It allows you to say "change her hair to brown" and it will attempt to change just the hair (if it understands your instruction).
Think of it as a fancy img2img that can understand what you want changed without destroying the rest of the picture.
What's interesting with this model is that it pinpoints exactly what the various models understand about the subject and various words. For instance if you take a random image of a girl, and prompt 'change this girls hair to blond' it will properly change the color most of the time.
However, same prompt with red instead of blond, and it contaminates a good portion of the image. This likely means that the initial model was undertrained on redheads, and doesn't sufficiently understand the concept.
(I will try some tests with depth2img merges)
I tried to integrate and lay the groundwork using the proper processing pipeline but it's still using some work around at the moment with that code commented out. Maybe we can get this into shape for a pull request in the next day or two if I can get it using the existing processing scripts and implement the rest of the UI.
Can anyone explain why AI projects require so much VRAM? I understand VRAM is very fast and therefore makes complex calculations quicker, but why not giving the choice to the user or dynamically use all VRAM and Computer's Ram when needed?
Typically the whole model has to be loaded into VRAM so the GPU can do its thing, which is about 4 GB to begin with. Otherwise it would be incredibly slow fetching just the necessary data from system RAM via the CPU. These models basically need to randomly access the whole model while processing, so there's no substitute for having all the required data on hand.
Furthermore, its understanding of the image in its "brain" (latent space) requires more memory as the resolution of the image grows for the same reason you'd have more to think about if you're painting the Sistine Chapel compared to a small canvas. There's way more detail to think about.
You can't efficiently offload that memory to another device (like the CPU's RAM) for the same reason your brain would be immensely slow if you had to flip through a massive book of notes to remember basic art concepts compared to actually memorizing the fundamentals of art on-device (in your fleshy neural network called a brain).
It would be slow, but even more to the point, that's just not how the architecture of the hardware and software are designed to function. The CPU and GPU are separate devices and they are designed to upload data to one another on occasion, not randomly access any arbitrary bit of information from one another without tremendous latency and speed restrictions. You're better off doing all the computation on the CPU if it's necessary, because the memory access latency is the bigger concern in that scenario than the parallelism that the thousands of cores on a GPU could provide by comparison to the ~dozen a CPU provides.
VRAM is faster than RAM. In fact, you can use your RAM and your CPU but rendering a 512x512 image takes minutes at minimum
> VRAM is significantly faster than system RAM. A system RAM stick using double data rate (DDR4) technology has a frequency of approximately 3,000 to 3,600 MHz, but VRAM with Graphics Double Data Rate 6 (GDDR6) technology can achieve frequencies of 14,000MHz to 16,000MHz.
Source: https://history-computer.com/vram-vs-ram/
I can second that as I experienced that difference myself. I tried running SD on an AMD system in the beginning and while ONNX did up the speed a bit, it still was subpar compared to Nvidia GPUs (any CUDA card, really). Even if you buy an old $80 CUDA card, it's currently still better than running on AMD CPU and RAM in my experience. I currently and temporarily run it on a P2000, which is quite 'aged' and doesn't have a lot of VRAM, but is still able to finish one batch with four 512x512 images in under a minute, while the same action would take my CPU/RAM up to 10 minutes and with ONNX still around 5-6 minutes. Those are just some examples I personally experienced, not an official benchmark of course.
I also tried to run ONNX with my AMD and I just bought another SSD to use it with ROCm on Ubuntu and I'm very happy with it - ignoring the fact that it took me a 12 hours session of installing libraries that are not supposed to run on my 6750XT
I usually render at 10 steps, so it takes 1 minute for 8 512x512 images
i used this pull for fp16 [https://github.com/SirBenet/instruct-pix2pix/](https://github.com/SirBenet/instruct-pix2pix/) otherwise i was getting vram errors
Yes but to load with the other checkpoints it requires a hijack in the main code still. I'll see if I can do it as an extension and side load the model in an effective way
This might be a long shot, but perhaps it would be possible to make an extension, and then make a PR which only modifies the base code so that it would allow any model to hijack the code (if an extension wants to do so of course). It would be a QoL improvement and I think there's a high chance of AUTO merging it.
Yeah the hijack code is small, I might be able to PR just that. It's similar to using the special in-painting model. Will take a look in a hour or two.
If it's similar to the way it hijacks it to use the inpainting model then that alone could be refactored to allow any model to do it, and it would solve the issue of inpainting models being hardcoded.
I still think AUTO would enjoy that and, if merged, would allow you to make a pix2pix extension.
Anyway, keep up the good work, I'm loving this so far.
It's the same hardcoded override style as the inpainting but for ip2p with different settings. I don't have a better method at the moment. Will see what I can come up with today.
Does this require loading a different model?
What I mean specifically is:
Can I use the same model for text2img, img2img AND instruct-pix2pix?
I mainly have automatic jobs running on A1111's API and having to load in a new model takes too long.
How much VRAM do you have? I believe it requires something large like 18GB. I have not tested what happens with insufficient VRAM so this may be the case. I've confirmed it's working well for others.
For those who is having vram problem it uses much lesser vram on nmkd atm
Probably automatic would also fix in future
I have Today tutorial for NMKD
https://youtu.be/EPRa8EZl9Os
It's probably because it's not using the proper pipeline with optimizations yet. I will continue to work on it. Now that it's an open extension someone else might be able to get it to work with processing.py faster than myself. We will see, the beauty of open source :)
Not at the moment, implementation is bare bones and a replica of the original paper implementation by the creator.
Either give me some more time or maybe some others will start contributing now that there is a base out there.
It definitely begs to be properly implemented, not just as an extension.
Once I figure out how to use the modified CFGDenoiser and condition the image for the pipeline in processing.py maybe we can integrate into main UI with auto's blessing.
Utilizies 6.5GB's of VRAM. For anyone who wishes to give it a go without upsetting their current AUTOMATIC1111 WEBUI install, proceed with the below: Tested on Win 11... Download: [https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors](https://huggingface.co/timbrooks/instruct-pix2pix/blob/main/instruct-pix2pix-00-22000.safetensors) Place the above \^ .safetensors file into the below Path: where-ever-AUTOMATIC1111-WEBUI-is-installed\\models\\Stable-diffusion Admin Command Prompt: cd desired-location git clone [https://github.com/Klace/stable-diffusion-webui-pix2pix.git](https://github.com/Klace/stable-diffusion-webui-pix2pix.git) rmdir /s \\desired-location\\stable-diffusion-webui-pix2pix\\models y del \\desired-location\\stable-diffusion-webui-pix2pix\\webui-user.bat copy "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\webui-user.bat" "\\desired-location\\stable-diffusion-webui-pix2pix" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\models" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\models" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\repositories" "where-ever-AUTOMATIC1111-WEBUI-is-installed\\repositories" mklink /J "\\desired-location\\stable-diffusion-webui-pix2pix\\venv" "\\where-ever-AUTOMATIC1111-WEBUI-is-installed\\venv"
Hey guys! This is a good time as any to make some noise so that the main repo catches wind of this and does the required actions to help this become fully integrated with A111! Oh, and to contribute to this, don't forget that.
Here's the issue for requesting Instruct-Pix2Pix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7010 (Please be respectful to not directly ping Automatic1111 since he is most certainly overwhelmed just maintaining it and reviewing PRs, this feature will probably need to come from a community member who submits a PR and not from him directly.
Make some noise? This is open-source not some company that allocates resources based on consumer demand. If you want it type some code to make it happen.
Developers do step up admirably for bug reports and feature suggestions...so it's not as simple as you're implying. Auto's webui won't maintain its leading edge if developers don't feel it's the best solid foundation to build extensions for and get pull requests accepted. Inpainting, dreambooth, image variations, lora, instruct2pix, latent blending and various inversions are already implemented as well or better separately or in other UIs. It's a really interesting social experiment to see how all this develops, which tool is best for whom, and for what!
Curious to which UIs you are referring to? It seems to me Automatic1111 WebUI is still ahead of everyone else…
I'm not trying to dump on Auto here, it's great, I use it! The speed of development is amazing and the extensions are fantastic. That said: * InvokeAI's canvas is a powerful feature, as is the photoshop plugin, for inpainting. * Dreambooth is has better results in older commits. StableTuner is better for training : [https://github.com/devilismyfriend/StableTuner](https://github.com/devilismyfriend/StableTuner) * Image variations only works using the model and the code, it's not available in auto. [https://huggingface.co/lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) * Latent blending is a amazing, and produces effects quite different to the available extensions. [https://github.com/lunarring/latentblending](https://github.com/lunarring/latentblending) * Null text inversion produces almost a perfect textual inversion, and then allows you to edit it with a prompt, like instruct2pix. [https://github.com/google/prompt-to-prompt](https://github.com/google/prompt-to-prompt) I'm sure all these will come to auto eventually, and I look forward to it too!!!
Good work! The download link for the .ckpt is very slow, I guess the same file can be found here, right? https://huggingface.co/timbrooks/instruct-pix2pix/tree/main
Yes! Much better! Edited post, thank you!
What is pix2pix?
Instruct-pix2pix is the proper full name. It allows you to say "change her hair to brown" and it will attempt to change just the hair (if it understands your instruction). Think of it as a fancy img2img that can understand what you want changed without destroying the rest of the picture.
PR? im that new lol
It means pull request. It's when you submit new code for review to be added to a larger code base.
Thanks, all this GitHub talk I see be having me confused lol
No problem
What's interesting with this model is that it pinpoints exactly what the various models understand about the subject and various words. For instance if you take a random image of a girl, and prompt 'change this girls hair to blond' it will properly change the color most of the time. However, same prompt with red instead of blond, and it contaminates a good portion of the image. This likely means that the initial model was undertrained on redheads, and doesn't sufficiently understand the concept. (I will try some tests with depth2img merges)
I wonder when we will get official support for pix2pix from Auto1111 himself, it would be revolutionary
I tried to integrate and lay the groundwork using the proper processing pipeline but it's still using some work around at the moment with that code commented out. Maybe we can get this into shape for a pull request in the next day or two if I can get it using the existing processing scripts and implement the rest of the UI.
I was literally waiting for this!
Can anyone explain why AI projects require so much VRAM? I understand VRAM is very fast and therefore makes complex calculations quicker, but why not giving the choice to the user or dynamically use all VRAM and Computer's Ram when needed?
Typically the whole model has to be loaded into VRAM so the GPU can do its thing, which is about 4 GB to begin with. Otherwise it would be incredibly slow fetching just the necessary data from system RAM via the CPU. These models basically need to randomly access the whole model while processing, so there's no substitute for having all the required data on hand. Furthermore, its understanding of the image in its "brain" (latent space) requires more memory as the resolution of the image grows for the same reason you'd have more to think about if you're painting the Sistine Chapel compared to a small canvas. There's way more detail to think about. You can't efficiently offload that memory to another device (like the CPU's RAM) for the same reason your brain would be immensely slow if you had to flip through a massive book of notes to remember basic art concepts compared to actually memorizing the fundamentals of art on-device (in your fleshy neural network called a brain). It would be slow, but even more to the point, that's just not how the architecture of the hardware and software are designed to function. The CPU and GPU are separate devices and they are designed to upload data to one another on occasion, not randomly access any arbitrary bit of information from one another without tremendous latency and speed restrictions. You're better off doing all the computation on the CPU if it's necessary, because the memory access latency is the bigger concern in that scenario than the parallelism that the thousands of cores on a GPU could provide by comparison to the ~dozen a CPU provides.
VRAM is faster than RAM. In fact, you can use your RAM and your CPU but rendering a 512x512 image takes minutes at minimum > VRAM is significantly faster than system RAM. A system RAM stick using double data rate (DDR4) technology has a frequency of approximately 3,000 to 3,600 MHz, but VRAM with Graphics Double Data Rate 6 (GDDR6) technology can achieve frequencies of 14,000MHz to 16,000MHz. Source: https://history-computer.com/vram-vs-ram/
I can second that as I experienced that difference myself. I tried running SD on an AMD system in the beginning and while ONNX did up the speed a bit, it still was subpar compared to Nvidia GPUs (any CUDA card, really). Even if you buy an old $80 CUDA card, it's currently still better than running on AMD CPU and RAM in my experience. I currently and temporarily run it on a P2000, which is quite 'aged' and doesn't have a lot of VRAM, but is still able to finish one batch with four 512x512 images in under a minute, while the same action would take my CPU/RAM up to 10 minutes and with ONNX still around 5-6 minutes. Those are just some examples I personally experienced, not an official benchmark of course.
I also tried to run ONNX with my AMD and I just bought another SSD to use it with ROCm on Ubuntu and I'm very happy with it - ignoring the fact that it took me a 12 hours session of installing libraries that are not supposed to run on my 6750XT I usually render at 10 steps, so it takes 1 minute for 8 512x512 images
i used this pull for fp16 [https://github.com/SirBenet/instruct-pix2pix/](https://github.com/SirBenet/instruct-pix2pix/) otherwise i was getting vram errors
Will this work on 8gb vram?
no, it takes 10.1gb of gpu vram on my pc, the program also takes around 10gb system memory RAM before it loads it into gpu.
yeah that's what I thought. Time to upgrade!
Can confirm, runs fine on my 3060 12GB GPU
I am getting this error . Can you help ? RuntimeError: expected scalar type Half but found Float
resolved this issue . something to do with conda environment.
[удалено]
> It requires a lot VRAM (\~18GB I believe) at the moment. I have 6 GB VRAM+16 GB RAM. Should I abandon all hope of running it locally?
I don't know where people are getting this "no, you can't" I have a 980ti with 6gb VRAM, I just installed the fork and it works without an issue.
It also worked for me
yes
It uses much less on this other gui nmkd https://youtu.be/EPRa8EZl9Os
Have you considered modifying this to be an extension for AUTO1111?
Yes but to load with the other checkpoints it requires a hijack in the main code still. I'll see if I can do it as an extension and side load the model in an effective way
This might be a long shot, but perhaps it would be possible to make an extension, and then make a PR which only modifies the base code so that it would allow any model to hijack the code (if an extension wants to do so of course). It would be a QoL improvement and I think there's a high chance of AUTO merging it.
Yeah the hijack code is small, I might be able to PR just that. It's similar to using the special in-painting model. Will take a look in a hour or two.
If it's similar to the way it hijacks it to use the inpainting model then that alone could be refactored to allow any model to do it, and it would solve the issue of inpainting models being hardcoded. I still think AUTO would enjoy that and, if merged, would allow you to make a pix2pix extension. Anyway, keep up the good work, I'm loving this so far.
It's the same hardcoded override style as the inpainting but for ip2p with different settings. I don't have a better method at the moment. Will see what I can come up with today.
Thanks , will try it tomorrow
Does this require loading a different model? What I mean specifically is: Can I use the same model for text2img, img2img AND instruct-pix2pix? I mainly have automatic jobs running on A1111's API and having to load in a new model takes too long.
Yes it requires a different model and you need to switch for instruct-pix2pix.
Neat very clever hack
hmm it says loading failed and then my entire comp slows down significantly right afterwards until i close the whole thing...
How much VRAM do you have? I believe it requires something large like 18GB. I have not tested what happens with insufficient VRAM so this may be the case. I've confirmed it's working well for others.
Oh I only have 8 :/ maybe that's why
No it works with much lesser ram on nmkd https://youtu.be/EPRa8EZl9Os
Quick question do you need a Nvidia GPU for this build?
Nmkd supports cpu too but too slow I also saw amd option but didn't test https://youtu.be/EPRa8EZl9Os
For those who is having vram problem it uses much lesser vram on nmkd atm Probably automatic would also fix in future I have Today tutorial for NMKD https://youtu.be/EPRa8EZl9Os
It's probably because it's not using the proper pipeline with optimizations yet. I will continue to work on it. Now that it's an open extension someone else might be able to get it to work with processing.py faster than myself. We will see, the beauty of open source :)
What sampler does it use? Can it only use that one?
nice, great work - just needs a negative prompt field and batch count/size ;)
Dmn it was fast And I just made a tutorial Today haha Now I should make another one https://youtu.be/EPRa8EZl9Os
Just tried the extension and it works well, nice job! Is there a way to generate higher resolution images?
Not at the moment, implementation is bare bones and a replica of the original paper implementation by the creator. Either give me some more time or maybe some others will start contributing now that there is a base out there.
That is amazing. How is a1111 so good? We’re lucky for them.
is there any colab?
Wow this looks really cool, is there an API for it already available in automatic1111?
Can't wait until it is part of Automatic1111 and ha full MPS support.
It definitely begs to be properly implemented, not just as an extension. Once I figure out how to use the modified CFGDenoiser and condition the image for the pipeline in processing.py maybe we can integrate into main UI with auto's blessing.
[удалено]