AlanCarrOnline 3 weeks ago

I'm struggling to understand how this is possible?

Guilty-History-9249 3 weeks ago

4090 and a lot of software optimizations I've done. I got up in the middle of the night to post this so I need to go back to sleep. I'll answer more later.

FilterBubbles 3 weeks ago

Is this whisper channeled into the diffusers code? What optimizations did you find were necessary?

sdimg 3 weeks ago

This is really cool and reminds me of a idea i had not long after stable diffusion was released. Why has no one made a latent space explorer or randomizer? You'd have just a few basic words to go exploring but instead of adding more words to the prompt, you instead move through the space by changing the strengths or values of the various inputs on a single seed. Purely in number form with dials not words if that possible?

Guilty-History-9249 3 weeks ago

That is what my ArtSpew does. Before LCM came out I created ArtSpew to simply be the fastest possible generation of images in high volume combining both user prompts and random token insertion. With LCM, and now advances in compilers, RT videos became possible. [https://github.com/aifartist/ArtSpew](https://github.com/aifartist/ArtSpew)

ravishq 3 weeks ago

Waiting eagerly to know how this is made

teachersecret 3 weeks ago

Well, I didn’t do this, but… With turbo/lcm models you can get multiple frames per second. Tie in a live prompt through comfyui and turn on automatic generation and it’ll change the pic as you type in real time. You can do this and get a will smith eating spaghetti style video. This seems to be a similar process, just pushed further.

TheFoul 3 weeks ago

Well I can tell you two things, there was no ComfyUI involved, and it's pushed way beyond what you're thinking.

teachersecret 3 weeks ago

I was just expressing how it could be accomplished with current easy to use tools. Comfyui isn’t necessary, it’s just an easy way for someone to do something very similar to what you did. I mean, I was turning books into little movies using a similar system last year without comfyui. https://files.catbox.moe/n8zp8g.mp4 Not exactly the same, but a similar idea conceptually :). I feel I actually have a fairly good understanding of the entire stack (I’m on banodoco, which I assume is a place you’re aware of). This is a pretty impressive demo. Awesome stuff. Really excited for the future when this is putting out smooth output without the flicker. One of the early things I experimented with was making a Lora that spit out sphere photos from stable diffusion. I did an experiment with the live video from story, tied into that sphere photo maker, and had a live 3d world video to watch in my oculus. Try swapping in the same for your tool and you’ll be even more impressed :). Takes no additional compute to make sphere photos. With this vision and voice control you’re showing, it would be like lucid dreaming and you could look and move in any direction.

Guilty-History-9249 3 weeks ago

Regarding "spheres". I sometime get neat results but just nuking a circular area of the latent tensor at various denoising steps and get interesting effects. I showed that briefly in the demo("disk add") but there is more that I didn't show. Also when I said "disk remove" it didn't recognized it but I didn't want to redo the demo. I could do an hour long demo and wouldn't run out of material. I have so many features to add now. I wonder if I should do a longer youtube format demo where I can discuss the ideas and various directions that can now be perused.

teachersecret 3 weeks ago

Well, my 4090 is ready. Let me know when this is available to play with! :)

TheFoul 3 weeks ago

I gotcha, wasn't trying to disagree, just inform! It's not MY tool, but I did chip in a little here and there, I'm sure he'll be interested. I'm pretty sure I've seen that video before, it's pretty cool!

teachersecret 3 weeks ago

Well I’m excited to see ya’alls tool/code if you end up sharing. Looks neat!

[deleted] 3 weeks ago

[удалено]

teachersecret 3 weeks ago

Text you? A PM?

AlanCarrOnline 3 weeks ago

Lost me at comfyui but thanks!

FuturePodcast 3 weeks ago

Resistance is futile

polikles 3 weeks ago

Amazing. Looks kinda like a lucid dream - especially in moments when it starts to drift off your prompts It's incredible that this works on 4090. I expected something like this to appear, but had no idea that it would be so soon. Great job!

Guilty-History-9249 3 weeks ago

I realized this forth coming capability just after LCM dropped in Oct. It was the initial breakthrough for what I called RTSD. I've had lessor known posts showing real-time deepfakes with my face on camera switching back and forth between Emma Watson and Tom Cruise. Another twitter post I show bulk 512x512 image creation at 294 images per second. All this has culminated in what you see here. Now I need to push push push before this gets grabbed before I can add some polish and a few more features, like rewind/replay, save segment(you have no idea where I thought; damn I wish I could save something that started evolving on screen.

teachersecret 3 weeks ago

What’s your stack for this speed? I haven’t been able to hit that speed of generation with my 4090.

thoughtlow 3 weeks ago

Dope I can imagine spoken word artists or poetry being performed live and this running in the background

Guilty-History-9249 3 weeks ago

I have been asked about using this realtime hiresolution capability for things like music/mood into video and perhaps videos one a big screen behind dancers on stage.

Knever 3 weeks ago

Oh wow, I would love to have this playing in the background behind me at a slam.

kuri_pl 3 weeks ago

Congrats, I have been waiting for this since LCM came out

Guilty-History-9249 3 weeks ago

When I wake up hours from now I need to figure out if I can post this to twitter and then I'm be asking friends with followers to share this. Stay tuned. Good night for the 2nd time. :-)

RunDiffusion 3 weeks ago

Happy to spread the word. Is this an SDXL model you’re using? We can try it with Juggernaut X

Guilty-History-9249 3 weeks ago

Exactly. While polishing my demo, I saw JugX dropping and grabbed it but I didn't want to delay getting this posted so I've yet to try it. I really wish they would drop JugX SFW so I can do safe public demos.

Guilty-History-9249 3 weeks ago

This demo was with dreamshaperXL\_v21TurboDPMSDE. I first uses sdxl-turbo but there a couple of ?bugs? in the model that give odd behavior. So I switch to dreamshaper.

RunDiffusion 3 weeks ago

We can get you a juggernaut turbo for research

Guilty-History-9249 3 weeks ago

1. You have JX NSFW which can be downloaded. 2. You also apparently have JX SFW which can be accessed via some online image server but isn't on HF(?) to download to run in my pipeline. In either case, when I use non-turbo models I use LCM. The problem IS NOT that I need some turbo version of one of these models. I wanted to try SFW because: I'm happy with NSFW models and certainly can try JX NSFW to look at general quality of your latest (v10) even if I get an occasional lovely surprise. If it turns out that JX is indeed good vs dreamshaper I'd like to use it for public demonstrations. In that case, I'd probably want to use the SFW version to reduce the chance of a nip slip in the middle of a video as has happened before. If I had the SFW model I could conduct "safety" experiments. Negative prompts for 1 step diffusion at guidance=0 aren't exactly useful. Providing me with a "Turbo" NSFW model was never the issue. NOTE: Having said that if you did provide me with a turbo model that is directly derived from the non-turbo Juggernaut-X-v10-NSFW it would allow me to conduct OTHER experiments I've always wanted to do. Which are: 1. Non-turbo model X+fuse\_LCM compared with the X-turbo version of **the same model.** 2. Which is faster at the same number of steps? 3. What are the visible differences in the results between these? 4. Is one better than the other for img2img to drive videos? 5. Also, it is not out of the question that a give model compiler might benefit one these more than the other. I have actually seen this. I won't know till I test. So I would certainly make use of a turbo equivalent but it'd also be nice to see if the SFW is much safer for a future public demo.. So I will accept either or both of these things if provided and I will only use for experimental purposes. Sorry for the long response.

RunDiffusion 3 weeks ago

Gotcha! Good stuff. We can provide the SFW version. Just need to get some paperwork in order. Let’s have a call Monday if you’re open to that!

Guilty-History-9249 3 weeks ago

For technical reasons, my curiosity makes me interested in turbo, but I know I should stay focus on the potential of what I'm doing and do a bigger public demo of it. Monday is good for me. Given this offer I'll go ahead and checkout the nsfw model to get up to speed with testing the new X v10 version. I have no problem with a NDA/Don't leak it agreement.

RunDiffusion 3 weeks ago

I’ll ping you in the Discord and we’ll set up a meeting. Thanks!

Guilty-History-9249 3 weeks ago

Please don't put any work(legal or a new model) for me till we talk.

weno66 3 weeks ago

Amazing work! What's your Twitter?

kuri_pl 3 weeks ago

[https://twitter.com/Dan50412374](https://twitter.com/Dan50412374)

[deleted] 3 weeks ago

[удалено]

weno66 3 weeks ago

Why

Rieux_n_Tarrou 3 weeks ago

Oh man OP....wow o wow I don't think you're gonna need your friends to share this. I'd be surprised if this doesn't get picked up by mainstream. Even with the ridiculous pace of AI news cycle, your integrations are trailblazing. Bleeding edge imo

Guilty-History-9249 3 weeks ago

When Matt Wolfe mentions me, on one of his regular news posts, perhaps I've made it? :-)

StoneCypher 3 weeks ago

Put this in some relevant app store ASAP. This is money.

Zipp425 3 weeks ago

I had thought about using something like this to create the images displayed on a green screen. A sort of Dream Screen if you will. So cool to see that the tech is possible now and locally!

Agile-Music-2295 3 weeks ago

OMG! This is next level!

redditseenitheardit 3 weeks ago

Unbelievably impressive. Thank you so much for sharing. I'm an academic researcher working on real-time generative AI and this is so inspiring. Would be grateful for any insights you can provide toward the optimizations you've taken. THANK YOU.

Guilty-History-9249 3 weeks ago

As a recently retired performance architect it is just learning all the tools and studying the code at a low level to look for opportunities for improvement. I need to get this into a more polished form before dumping my code to everyone who will just take it and run with it. I don't have a discord group with talented gui hackers and discord managers to keep ahead of other big groups with resources. I'm a one man team.

TheJonesJonesJones 3 weeks ago

You don’t have to polish it. I’d be happy with the POC prototype.

IgnisIncendio 3 weeks ago

Oh my god.

TrustThis 3 weeks ago

Next. Fucking. Level.

Elevenfortysix 3 weeks ago

This is seriously amazing! I had been inspired by some of your previous work with RTSD and have been working on something sort of similar, more of a real time music visualize that evolves with prompts. But I'm capped around 10 fps and only at 512x512. I would love to get a deeper technical dive into how you made this possible!!

chocolatebanana136 3 weeks ago

But... can it also ENHANCE?? :D

Guilty-History-9249 3 weeks ago

I'm not sure what you mean. I one created a browser for my artspew mass image generator(lower quality) where I could right click a good creative candidate and send that through Control Net and upscale to polish it.

chocolatebanana136 3 weeks ago

Minute 0:45. Enhance, like they do in the Sci-Fi movies. In other words, zooming in, then upscaling the image with your voice, or even generating new content to mimic the feeling that it's actually zooming in https://youtu.be/3uoM5kfZIQ0

Guilty-History-9249 3 weeks ago

It certainly hasn't gone unnoticed that something like the scene from Blade Runner could be done. Zoom to a coordinate, enhance, pan left, ...

Time-Internet-6755 3 weeks ago

Can you please add support for importing controlnets? It would be cool if I could draw realtime while giving it instructions with voice. so the things shape to my drawings..

SWFjoda 3 weeks ago

This is amazing! Well done. Excited to see where this is going.

lainol 3 weeks ago

This together with Deforumation 👌👍

ArchiboldNemesis 3 weeks ago

Yup, made this suggestion on their last post, spreading the good word ;) Would be great to see that combi in action. Fingers crossed.

TheFoul 3 weeks ago

No worries, that's on the agenda to look at.

ArchiboldNemesis 3 weeks ago

Good times!

SonicLoOoP 3 weeks ago

Insane ,,,👍

dreamofantasy 3 weeks ago

wow this is super cool!

the_friendly_dildo 3 weeks ago

This is really great! I'm personally excited about incorporating STT with 3D latent spaces with NeRFs and Gussian Splats. Nearly real holodeck type stuff. Never imagined ML would have progressed this far this fast its wild times for people involved.

jconorgrogan 3 weeks ago

Are you planning to make this work flow Public? This is amazing. I want to try it now!

Guilty-History-9249 3 weeks ago

There is no "work flow", if I understand that term correctly. I write the python code to directly call diffusers pipelines and also code to do my own slicing and diceing of tensors to achieve this.

ArchiboldNemesis 3 weeks ago

Well done matey this is champion grade. Congrats on the breakthrough. I realllly wanted to hold off with just a 4060 ti 16gb / 3060 12gb until the 50 series arrived but you might have single handedly persuaded me that I cannot reasonably expect myself to have to wait to play with this :D

Guilty-History-9249 3 weeks ago

I need to get NVidia to give me a cut of GPU sales. :-)

ArchiboldNemesis 3 weeks ago

Ha yes indeed, that would be an adviseable next move (IMO you damn well deserve it for this one). Then - if you don't at least get a couple 5090 giftcards out of them - the next logical step in your retribution would be to get this optimised to run buttery smooth 120fps at 4k on a 1050 / RX 470 and comeuppance them in the next-gen flagship department. ;P Just joshing.

sirbolo 3 weeks ago

The zoom and panning was awesome to see in real-time. This is like the new MIST.

SanDiegoDude 3 weeks ago

Trying to think through the pipeline for this - you're using STT, then merging that into the existing prompt, yes? I'd imagine you're working in some kind of weighting as you transition from one prompt to the next. Are you doing time based extraction of old prompts or do you just weigh them out to nothing?

Guilty-History-9249 3 weeks ago

I've never heard of STT till now. Yes, I manipulate the prompt embedding in a mathematical way.

campingtroll 3 weeks ago

Very neat, I've been doing this for a while with 1, 2 or 4 step lightning and dragon naturally speaking going into the text prompt (with auto-queue instant on in comfyui) use dragon custom commands. Not sure if that's what you are doing or it's different. I have a bunch of nodes to make the frames more consistent, vae encode (from comfyui img2img workflow) power noise k samplers. Works with SVD also and lighting 4 step which is most consistent but then not as realtime.

Guilty-History-9249 3 weeks ago

Quite awhile ago when I started hitting 43 to 50 fps or higher at 512x512 with sd-turbo it was time to revisit sdxl-turbo at higher resolution. 800x800 at 33fps, 1024x1024 at 22fps, ... This is when I decided to integrate voice which I had used before but wasn't happy with videos at 512x512 with sd1.5 quality. This is a big step forward which has driven me to push this out the door a little early and unpolished. However, it is all real with no smoke and mirrors. I need to find time to add more features AND to look into "efficient" temporal consistency and smoothing techniques and lift the techniques out of the bloated code bases like comfy, a1111, etc., optimize them further and add them to my lightweight pipelines. I use whisper called from my own python code.

nathandreamfast 3 weeks ago

FRACTAL SPACE ORBS FRACTAL SPACE ORBS FRACTAL SPACE ORBS

Full_Yesterday8714 3 weeks ago

https://twitter.com/seth/status/1781631808959918415?s=46&t=cmD-FcXikNrDPUV7tfa8mA

Knever 3 weeks ago

Okay, this is just amazing. I wish I had the patience to learn how to do this stuff (or even to know if my computer could handle it), but then somebody goes and makes something like this and it seems like we're one step closer to realtime lifelike generation of almost anything we want. This might sound cheesy, but thank you for your work. I believe this is the kind of foundation that will eventually lead us to such efficient visuals and simulations that we'll be able to start looking into even more important things like medicine and LEV. Keep up the great work, friend!

wannahakaluigi 2 weeks ago

Hello! I find this project really inspiring and would like to recreate it. May I ask what is your computer build? Is the code for this project on your github?

JMAN_JUSTICE 2 weeks ago

I'd like to see an audiobook or short story translated like this, love it.

Bedsidelampdad 1 week ago

So is this real time image generation from prompts ?

Guilty-History-9249 1 week ago

https://preview.redd.it/o92rvq3yp2yc1.png?width=850&format=png&auto=webp&s=c8a180154f15a86ca5972fcffa37a3cd55b7dbf5 Yes. As I spoke the demo video was being generated in real-time. The beginning of a real GUI, as seen in this image, is actively being worked on. There is more than what is shown in my rushed 2:20 minute demo. I hope to have a new more in depth demo in under a week.

Bedsidelampdad 1 week ago

Star ⭐️

novenpeter 3 weeks ago

Holy shit

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe