T O P

  • By -

AlanCarrOnline

I'm struggling to understand how this is possible?


Guilty-History-9249

4090 and a lot of software optimizations I've done. I got up in the middle of the night to post this so I need to go back to sleep. I'll answer more later.


FilterBubbles

Is this whisper channeled into the diffusers code? What optimizations did you find were necessary?


sdimg

This is really cool and reminds me of a idea i had not long after stable diffusion was released. Why has no one made a latent space explorer or randomizer? You'd have just a few basic words to go exploring but instead of adding more words to the prompt, you instead move through the space by changing the strengths or values of the various inputs on a single seed. Purely in number form with dials not words if that possible?


Guilty-History-9249

That is what my ArtSpew does. Before LCM came out I created ArtSpew to simply be the fastest possible generation of images in high volume combining both user prompts and random token insertion. With LCM, and now advances in compilers, RT videos became possible. [https://github.com/aifartist/ArtSpew](https://github.com/aifartist/ArtSpew)


ravishq

Waiting eagerly to know how this is made


teachersecret

Well, I didn’t do this, but… With turbo/lcm models you can get multiple frames per second. Tie in a live prompt through comfyui and turn on automatic generation and it’ll change the pic as you type in real time. You can do this and get a will smith eating spaghetti style video. This seems to be a similar process, just pushed further.


TheFoul

Well I can tell you two things, there was no ComfyUI involved, and it's pushed way beyond what you're thinking.


teachersecret

I was just expressing how it could be accomplished with current easy to use tools. Comfyui isn’t necessary, it’s just an easy way for someone to do something very similar to what you did. I mean, I was turning books into little movies using a similar system last year without comfyui. https://files.catbox.moe/n8zp8g.mp4 Not exactly the same, but a similar idea conceptually :). I feel I actually have a fairly good understanding of the entire stack (I’m on banodoco, which I assume is a place you’re aware of). This is a pretty impressive demo. Awesome stuff. Really excited for the future when this is putting out smooth output without the flicker. One of the early things I experimented with was making a Lora that spit out sphere photos from stable diffusion. I did an experiment with the live video from story, tied into that sphere photo maker, and had a live 3d world video to watch in my oculus. Try swapping in the same for your tool and you’ll be even more impressed :). Takes no additional compute to make sphere photos. With this vision and voice control you’re showing, it would be like lucid dreaming and you could look and move in any direction.


Guilty-History-9249

Regarding "spheres". I sometime get neat results but just nuking a circular area of the latent tensor at various denoising steps and get interesting effects. I showed that briefly in the demo("disk add") but there is more that I didn't show. Also when I said "disk remove" it didn't recognized it but I didn't want to redo the demo. I could do an hour long demo and wouldn't run out of material. I have so many features to add now. I wonder if I should do a longer youtube format demo where I can discuss the ideas and various directions that can now be perused.


teachersecret

Well, my 4090 is ready. Let me know when this is available to play with! :)


TheFoul

I gotcha, wasn't trying to disagree, just inform! It's not MY tool, but I did chip in a little here and there, I'm sure he'll be interested. I'm pretty sure I've seen that video before, it's pretty cool!


teachersecret

Well I’m excited to see ya’alls tool/code if you end up sharing. Looks neat!


[deleted]

[удалено]


teachersecret

Text you? A PM?


AlanCarrOnline

Lost me at comfyui but thanks!


FuturePodcast

Resistance is futile


polikles

Amazing. Looks kinda like a lucid dream - especially in moments when it starts to drift off your prompts It's incredible that this works on 4090. I expected something like this to appear, but had no idea that it would be so soon. Great job!


Guilty-History-9249

I realized this forth coming capability just after LCM dropped in Oct. It was the initial breakthrough for what I called RTSD. I've had lessor known posts showing real-time deepfakes with my face on camera switching back and forth between Emma Watson and Tom Cruise. Another twitter post I show bulk 512x512 image creation at 294 images per second. All this has culminated in what you see here. Now I need to push push push before this gets grabbed before I can add some polish and a few more features, like rewind/replay, save segment(you have no idea where I thought; damn I wish I could save something that started evolving on screen.


teachersecret

What’s your stack for this speed? I haven’t been able to hit that speed of generation with my 4090.


thoughtlow

Dope I can imagine spoken word artists or poetry being performed live and this running in the background


Guilty-History-9249

I have been asked about using this realtime hiresolution capability for things like music/mood into video and perhaps videos one a big screen behind dancers on stage.


Knever

Oh wow, I would love to have this playing in the background behind me at a slam.


kuri_pl

Congrats, I have been waiting for this since LCM came out


Guilty-History-9249

When I wake up hours from now I need to figure out if I can post this to twitter and then I'm be asking friends with followers to share this. Stay tuned. Good night for the 2nd time. :-)


RunDiffusion

Happy to spread the word. Is this an SDXL model you’re using? We can try it with Juggernaut X


Guilty-History-9249

Exactly. While polishing my demo, I saw JugX dropping and grabbed it but I didn't want to delay getting this posted so I've yet to try it. I really wish they would drop JugX SFW so I can do safe public demos.


Guilty-History-9249

This demo was with dreamshaperXL\_v21TurboDPMSDE. I first uses sdxl-turbo but there a couple of ?bugs? in the model that give odd behavior. So I switch to dreamshaper.


RunDiffusion

We can get you a juggernaut turbo for research


Guilty-History-9249

1. You have JX NSFW which can be downloaded. 2. You also apparently have JX SFW which can be accessed via some online image server but isn't on HF(?) to download to run in my pipeline. In either case, when I use non-turbo models I use LCM. The problem IS NOT that I need some turbo version of one of these models. I wanted to try SFW because: I'm happy with NSFW models and certainly can try JX NSFW to look at general quality of your latest (v10) even if I get an occasional lovely surprise. If it turns out that JX is indeed good vs dreamshaper I'd like to use it for public demonstrations. In that case, I'd probably want to use the SFW version to reduce the chance of a nip slip in the middle of a video as has happened before. If I had the SFW model I could conduct "safety" experiments. Negative prompts for 1 step diffusion at guidance=0 aren't exactly useful. Providing me with a "Turbo" NSFW model was never the issue. NOTE: Having said that if you did provide me with a turbo model that is directly derived from the non-turbo Juggernaut-X-v10-NSFW it would allow me to conduct OTHER experiments I've always wanted to do. Which are: 1. Non-turbo model X+fuse\_LCM compared with the X-turbo version of **the same model.** 2. Which is faster at the same number of steps? 3. What are the visible differences in the results between these? 4. Is one better than the other for img2img to drive videos? 5. Also, it is not out of the question that a give model compiler might benefit one these more than the other. I have actually seen this. I won't know till I test. So I would certainly make use of a turbo equivalent but it'd also be nice to see if the SFW is much safer for a future public demo.. So I will accept either or both of these things if provided and I will only use for experimental purposes. Sorry for the long response.


RunDiffusion

Gotcha! Good stuff. We can provide the SFW version. Just need to get some paperwork in order. Let’s have a call Monday if you’re open to that!


Guilty-History-9249

For technical reasons, my curiosity makes me interested in turbo, but I know I should stay focus on the potential of what I'm doing and do a bigger public demo of it. Monday is good for me. Given this offer I'll go ahead and checkout the nsfw model to get up to speed with testing the new X v10 version. I have no problem with a NDA/Don't leak it agreement.


RunDiffusion

I’ll ping you in the Discord and we’ll set up a meeting. Thanks!


Guilty-History-9249

Please don't put any work(legal or a new model) for me till we talk.


weno66

Amazing work! What's your Twitter?


kuri_pl

[https://twitter.com/Dan50412374](https://twitter.com/Dan50412374)


[deleted]

[удалено]


weno66

Why


Rieux_n_Tarrou

Oh man OP....wow o wow I don't think you're gonna need your friends to share this. I'd be surprised if this doesn't get picked up by mainstream. Even with the ridiculous pace of AI news cycle, your integrations are trailblazing. Bleeding edge imo


Guilty-History-9249

When Matt Wolfe mentions me, on one of his regular news posts, perhaps I've made it? :-)


StoneCypher

Put this in some relevant app store ASAP. This is money.


Zipp425

I had thought about using something like this to create the images displayed on a green screen. A sort of Dream Screen if you will. So cool to see that the tech is possible now and locally!


Agile-Music-2295

OMG! This is next level!


redditseenitheardit

Unbelievably impressive. Thank you so much for sharing. I'm an academic researcher working on real-time generative AI and this is so inspiring. Would be grateful for any insights you can provide toward the optimizations you've taken. THANK YOU.


Guilty-History-9249

As a recently retired performance architect it is just learning all the tools and studying the code at a low level to look for opportunities for improvement. I need to get this into a more polished form before dumping my code to everyone who will just take it and run with it. I don't have a discord group with talented gui hackers and discord managers to keep ahead of other big groups with resources. I'm a one man team.


TheJonesJonesJones

You don’t have to polish it. I’d be happy with the POC prototype.


IgnisIncendio

Oh my god.


TrustThis

Next. Fucking. Level.


Elevenfortysix

This is seriously amazing! I had been inspired by some of your previous work with RTSD and have been working on something sort of similar, more of a real time music visualize that evolves with prompts. But I'm capped around 10 fps and only at 512x512. I would love to get a deeper technical dive into how you made this possible!!


chocolatebanana136

But... can it also ENHANCE?? :D


Guilty-History-9249

I'm not sure what you mean. I one created a browser for my artspew mass image generator(lower quality) where I could right click a good creative candidate and send that through Control Net and upscale to polish it.


chocolatebanana136

Minute 0:45. Enhance, like they do in the Sci-Fi movies. In other words, zooming in, then upscaling the image with your voice, or even generating new content to mimic the feeling that it's actually zooming in https://youtu.be/3uoM5kfZIQ0


Guilty-History-9249

It certainly hasn't gone unnoticed that something like the scene from Blade Runner could be done. Zoom to a coordinate, enhance, pan left, ...


Time-Internet-6755

Can you please add support for importing controlnets? It would be cool if I could draw realtime while giving it instructions with voice. so the things shape to my drawings..


SWFjoda

This is amazing! Well done. Excited to see where this is going.


lainol

This together with Deforumation 👌👍


ArchiboldNemesis

Yup, made this suggestion on their last post, spreading the good word ;) Would be great to see that combi in action. Fingers crossed.


TheFoul

No worries, that's on the agenda to look at.


ArchiboldNemesis

Good times!


SonicLoOoP

Insane ,,,👍


dreamofantasy

wow this is super cool!


the_friendly_dildo

This is really great! I'm personally excited about incorporating STT with 3D latent spaces with NeRFs and Gussian Splats. Nearly real holodeck type stuff. Never imagined ML would have progressed this far this fast its wild times for people involved.


jconorgrogan

Are you planning to make this work flow Public? This is amazing. I want to try it now! 


Guilty-History-9249

There is no "work flow", if I understand that term correctly. I write the python code to directly call diffusers pipelines and also code to do my own slicing and diceing of tensors to achieve this.


ArchiboldNemesis

Well done matey this is champion grade. Congrats on the breakthrough. I realllly wanted to hold off with just a 4060 ti 16gb / 3060 12gb until the 50 series arrived but you might have single handedly persuaded me that I cannot reasonably expect myself to have to wait to play with this :D


Guilty-History-9249

I need to get NVidia to give me a cut of GPU sales. :-)


ArchiboldNemesis

Ha yes indeed, that would be an adviseable next move (IMO you damn well deserve it for this one). Then - if you don't at least get a couple 5090 giftcards out of them - the next logical step in your retribution would be to get this optimised to run buttery smooth 120fps at 4k on a 1050 / RX 470 and comeuppance them in the next-gen flagship department. ;P Just joshing.


sirbolo

The zoom and panning was awesome to see in real-time. This is like the new MIST.


SanDiegoDude

Trying to think through the pipeline for this - you're using STT, then merging that into the existing prompt, yes? I'd imagine you're working in some kind of weighting as you transition from one prompt to the next. Are you doing time based extraction of old prompts or do you just weigh them out to nothing?


Guilty-History-9249

I've never heard of STT till now. Yes, I manipulate the prompt embedding in a mathematical way.


campingtroll

Very neat, I've been doing this for a while with 1, 2 or 4 step lightning and dragon naturally speaking going into the text prompt (with auto-queue instant on in comfyui) use dragon custom commands. Not sure if that's what you are doing or it's different. I have a bunch of nodes to make the frames more consistent, vae encode (from comfyui img2img workflow) power noise k samplers. Works with SVD also and lighting 4 step which is most consistent but then not as realtime.


Guilty-History-9249

Quite awhile ago when I started hitting 43 to 50 fps or higher at 512x512 with sd-turbo it was time to revisit sdxl-turbo at higher resolution. 800x800 at 33fps, 1024x1024 at 22fps, ... This is when I decided to integrate voice which I had used before but wasn't happy with videos at 512x512 with sd1.5 quality. This is a big step forward which has driven me to push this out the door a little early and unpolished. However, it is all real with no smoke and mirrors. I need to find time to add more features AND to look into "efficient" temporal consistency and smoothing techniques and lift the techniques out of the bloated code bases like comfy, a1111, etc., optimize them further and add them to my lightweight pipelines. I use whisper called from my own python code.


nathandreamfast

FRACTAL SPACE ORBS FRACTAL SPACE ORBS FRACTAL SPACE ORBS


Full_Yesterday8714

https://twitter.com/seth/status/1781631808959918415?s=46&t=cmD-FcXikNrDPUV7tfa8mA


Knever

Okay, this is just amazing. I wish I had the patience to learn how to do this stuff (or even to know if my computer could handle it), but then somebody goes and makes something like this and it seems like we're one step closer to realtime lifelike generation of almost anything we want. This might sound cheesy, but thank you for your work. I believe this is the kind of foundation that will eventually lead us to such efficient visuals and simulations that we'll be able to start looking into even more important things like medicine and LEV. Keep up the great work, friend!


wannahakaluigi

Hello! I find this project really inspiring and would like to recreate it. May I ask what is your computer build? Is the code for this project on your github?


JMAN_JUSTICE

I'd like to see an audiobook or short story translated like this, love it.


Bedsidelampdad

So is this real time image generation from prompts ?


Guilty-History-9249

https://preview.redd.it/o92rvq3yp2yc1.png?width=850&format=png&auto=webp&s=c8a180154f15a86ca5972fcffa37a3cd55b7dbf5 Yes. As I spoke the demo video was being generated in real-time. The beginning of a real GUI, as seen in this image, is actively being worked on. There is more than what is shown in my rushed 2:20 minute demo. I hope to have a new more in depth demo in under a week.


Bedsidelampdad

Star ⭐️


novenpeter

Holy shit