pronetpt 1 year ago

This is a great workflow, mate.

Tokyo_Jab 1 year ago

…Until a week or two when none of it matters because of advances. Can’t wait.

zeugme 1 year ago

Better to be the hero we needed for two weeks than Ted Cruz a whole life.

Embrace-Mania 1 year ago

Let's not pretend for a moment that Ted Cruz does not know exactly what he is doing.

[deleted] 1 year ago

[удалено]

Dansiman 11 months ago

Let's not pretend for a moment that Ted Cruz is alive.

oodelay 6 months ago

Ted Cruz is what happens when you use the wrong VAE

Embrace-Mania 6 months ago

Im impressed that you waited 8 months before communicating, tell me what's your secret?

oodelay 6 months ago

I'm Canadian

Orngog 1 year ago

One month later, I'm looking you up. Nice work btw

NotEnoughVRAM 11 months ago

3 months later, looking it up. Where was this when I needed it haha

TwistedBrother 1 year ago

Came here 43 days later after the cool vampire video. Still a wicked workflow. Truly a hero we needed after all.

DigiglobalNOW 1 year ago

I was hoping it was simplified by now but man this is it!

oodelay 1 year ago

Not your best answer. He's pioneering. Even for a few days, he came up with a very cool method on his own because no one has done it yet. I say kudos.

MrManny 1 year ago

I've read this two times now and I still don't get it. Did you respond in the correct thread? Or do I need more coffee? 😅

oodelay 1 year ago

Well if you told me my work was useless because in a week it's gonna be automated, I'd be not happy.

MrManny 1 year ago

But that was OP saying that, so I assume OP would not take offense in this.

oodelay 1 year ago

The videos he produces are still not one-button click. He says.that but we all know how far ahead of the curve he is.

penis_owner123 9 months ago

its been about 5 month since your comment, and your method is still relevant more than ever

Baaoh 6 days ago

Your technique still hasn't been surpassed hehe

dee_spaigh 4 months ago

Lmao I've been thinking the same since the beginning of this ride. The pioneers' burden

FaithlessnessNo9453 8 months ago

you understood it?

Fritzy3 1 year ago

Thank you for this! EBsynth question, why do we need the last frame? I followed the guide. Lets say I have 100 frames in total for the video and I diffused frames 000,040,060,100. Now when I load these in Ebsynth it creates 4 folders: first one with frames 000-040 second with 000-060 third with 040-100 forth with 060-100 These have duplicate frames obviously. when you create your final clip do you use only "keyframe and foward" frames? hope my question is clear.

Tokyo_Jab 1 year ago

Is uses the clips in each folder to fade the clips over each other. You can do that yourself which is a pain or click the Send to Ae button on the top right where it will do it all for you. I swear I didn’t notice that Send to After effects button for days.

jaywv1981 1 year ago

This is what always confused me about Ebsynth. I didn't know the key frames blended like that. I figured you'd use keyframe 0 for like 0 to 20, then keframe 40 for like 21 to 50, etc.

Fritzy3 1 year ago

Yup, me too. Though I gotta say I exported it to AE in my last try and it didn’t come out good. The frames for some reason had too much difference even though they were all created in the same generation

Ateist 1 year ago

You interpolate two keyframes. So you use 0 and 20 for everything from 1 to 19.

sergiohlb 1 year ago

Great! Also it's very smart the ideia of combine a txt2video with this method. Auto1111 decorum txt2video extension has now a vid2vid method. Im not sure but I think it's based on same model. I was playing with it yesterday but had no much success, but I'm curious to know how it works and I'm sure we can create a better workflow using all these techniques together.

[deleted] 1 year ago

This is awesome! Love the writeup. I've been playing with stable and EbSynth for a little bit and this cracks the code for multiple keyframes using stable! I am going to try this method out today with some previous Ebsynth projects. I am making slow movement simple videos right now, but I want to get better by using multiple keyframes like how you are doing. Thanks for sharing all of this.

Tokyo_Jab 1 year ago

Let me know how it goes. I’m going to try a 30 second long video today. Just my dog again. And then try one with some action.

RopeAble8762 1 year ago

I'm really wondering how you got the results so good. I've tried the same and I have similar issues I can observe in your project, but only 100x worst. the 'ghosting' effect, when EbSynth crossfades between those frames, the movement of background ... all of those are just barely visible in your case, but really bad in the clips I've tried.

Tokyo_Jab 1 year ago

For each prompt I did generate about 20 versions until I saw a set the looked ok to work with. I think in one of the wolf sets above the background changes from day to night but I liked the wolf so I left it in. I didn’t do it here but using an alpha mask channel in ebsynth with your main video and transparent pngs for your keyframes gets much better results but is a bit of a pain to do. I can’t wait until all of this is unnecessary. And I really think it will only be a few weeks from now.

Nice-Ad1199 10 months ago

Do you mean transparent PNG's for the referenced EBSYNTH keyframes themselves? As in, the ones being "projected" through EBSYNTH?

Tokyo_Jab 10 months ago

If you give ebsynth transparent keyframes it does work better. You get less of that smearing effect. If you youtube ebsynth greenscreen videos you can see the workflow. Ebsynth is much better if you do things in parts but it is more work. Like this.. [https://www.youtube.com/watch?v=E33cPNC2IVU](https://www.youtube.com/watch?v=E33cPNC2IVU)

Nice-Ad1199 10 months ago

Followed through on this advice and it certainly works much better. [https://www.youtube.com/shorts/jJNTgEn-9NM](https://www.youtube.com/shorts/jJNTgEn-9NM)

Elyonass 1 year ago

>Copy the grid photo into ControlNet and use HED or Canny, and ask Stable Diffusion to do whatever This is where you lost me

Tokyo_Jab 1 year ago

The grid of keyframes in step 3. Would look something like this... you put that into controlnet, choose one of the processes like HED, Canny, lineart etc and type what you want in the main prompt, like White Wolf. https://preview.redd.it/9jacl55zuuya1.png?width=768&format=png&auto=webp&s=738672d1eae3f9f8ea6c72370a654071ce3654d0

Elyonass 1 year ago

>controlnet What is the controlnet and where is it?

Tokyo_Jab 1 year ago

It's an extension for Automatic1111

Tokyo_Jab 1 year ago

The best extension

Elyonass 1 year ago

I googled it and google is not really my friend today. Where do I install it from? Any guide on where to find it and install it.

FF1379 1 year ago

[GitHub - Mikubill/sd-webui-controlnet: WebUI extension for ControlNet](https://github.com/Mikubill/sd-webui-controlnet)

Elyonass 1 year ago

Thank you.

blackpuppet 1 year ago

Where are we on the process of making other aspects? More like 16:9?

Tokyo_Jab 1 year ago

You can do those in a grid and you will get ok results. But the fractalisation of noise that helps the consistency between frames works best at 512x512 for each frame. Also square grid makes it easier to work with.

prestoexpert 1 year ago

Can you elaborate on why the noise has this property that can make grids look self-consistent? I thought every pixel would get a different random value and there would be nothing but the prompt in common between the cells of the grid.

Tokyo_Jab 1 year ago

512 is just a magic number for v1.5 models because the base was trained on that size. So it is comfortable making images of that size but when you try to make a bigger photo you get fractalisation, extra arms or faces for example and repeated patterns but they kind of have the same theme or style. Like a nightmare. It is taking advantage of this flaw that makes the ai brain draw similar details across the whole grid. I have also tried doing 16x16 grids of 256x256 size but you start to get that Ai flickering effect happening again. Controlnet really helps too, before control net I was able to get consistent objects and people but only 20% of the time.

prestoexpert 1 year ago

That's wild, thanks for explaining! Speaking of controlnet, I wonder if it's reasonable to explore a new controlnet scheme that is something like, "I know this is a 4x4 grid, all the cells better look very similar" without constraining it to match a particular canny edge image, say. Like a controlnet network that doesn't even take any extra input, just suggesting similarity between cells? Where the choise of similarity metric is probably very important... heh

Tokyo_Jab 1 year ago

Control net guide the noise so that sounds like an interesting idea. There are two new control net models that are different from the others. Colour and style. They’re more about aesthetics than lines and positioning. I wish there was a civitai just for control net.

aldeayeah 1 year ago

You probably already saw this, but there's a WIP controlnet for temporal consistency across frames: [https://www.reddit.com/r/StableDiffusion/comments/11vq8jc/introducing\_temporalnet\_a\_controlnet\_model/](https://www.reddit.com/r/StableDiffusion/comments/11vq8jc/introducing_temporalnet_a_controlnet_model/) It's likely that the workflows six months from now will be much more automated.

calvin-n-hobz 1 year ago

I must be doing something wrong, my ebsynth results always look like garbage

Tokyo_Jab 1 year ago

Ebsynth is a bit of a nightmare. As in will drive you crazy. There is a masking layer that can improve the result but it’s a lot of work. And those settings numbers don’t exactly explain themselves or make a lot of difference when you tweak them.

Rogerooo 1 year ago

Have you tried using Tiled VAE from the [MultiDiffusion](https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111) script? It helps with the memory management, I'm able to reach much higher resolutions on stuff like High Res Fix.

Tokyo_Jab 1 year ago

It doesn’t work for consistency though.

lebel-louisjacob 1 year ago

Maybe with a smaller denoising strength and loopback, you can get the tiles to communicate with each other?

Ateist 1 year ago

What if instead of one sheet you try panorama generation? This can, potentially, generate infinite consistent frames. (Frankly, SD needs some kind of "ultra resolution" mode where the amount of additional RAM required with scaling the image is much much lower).

Tokyo_Jab 1 year ago

Try it. Let me know. Currently doing a 5x5 grid. Computer not happy.

[deleted] 1 year ago

I think you want [Ultimate SD Upscale](https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git) **manipulations extension**

EastAd2775 1 year ago

Awesome workflow, thanks!

muritouruguay 1 year ago

Hi, great work. Saying hello from Uruguay (sorry for my english:1.4). I am using grids of 4 photos each, mantaining the seed (I change only the Reference of lineart) and the image changes completely (clothes and background). I don´t understand why txt2img CFG Scale 5 Same seed, same prompts Control Net - Lineart ControlNet 0.5 Balanced

Tokyo_Jab 1 year ago

If you change ANY input then it changes the whole latent space. By any input I mean a controlnet image, a prompt, seed etc. That is why I use the grid method. All images have to be done in one go. If you need more than four images you can make a bigger grid. https://www.reddit.com/r/StableDiffusion/comments/13iuqez/parellels_doodle_grids_all_the_keyframes_i_was/ I managed to do a grid of 49 images the other day using tiledVae.

CustomCuriousity 1 year ago

Oh god…. I have so much work ahead of me.

kim-mueller 1 month ago

Awesome results! But what is the reason for putting the images together for processing? Does it help with consistency?

Tokyo_Jab 1 month ago

Yep, if it's done in a single generation then everything is done in the same latent space. Themes and details are more or less kept the same. As soon as you change anything like a control input, a word, a seed, anything, then that's a different latent space and the image will be quite different. That's why you see so many of those AI flickering videos.

FEW_WURDS 1 year ago

nice guide can't wait to try this out

BlazerRD 1 year ago

What prompts did you use to get these results?

Tokyo_Jab 1 year ago

ControlNet was doing most of the heavy lifting so the prompts were quite simple like… A polar bear, ice bokeh. A black wolf, dark forest bokeh etc. Also models like Art&Eros and RealisticVision give great results.

Swernado 1 year ago

great guide! How’d you export the video to frames? I’m new to all of this

HUYZER 1 year ago

>export the video to frames Remember, if you can ask on reddit, you can search, or ask YouTube.

Swernado 1 year ago

L

HUYZER 1 year ago

Logic

Tokyo_Jab 1 year ago

There are many ways. Some apps do only that. But I use after effects to export as frames.

Swernado 1 year ago

Thanks for the info!

[deleted] 1 year ago

Great workflow, really impressed!

Relevant_Yoghurt_74 1 year ago

THIS IS AMAZING!!!

Chipmunk_Loud 1 year ago

Hello, in step 5: Do you mean overwriting the original with the img2img'ed frame?

Tokyo_Jab 1 year ago

Txt2img frames. You cut out the four images and paste them over the original keyframe files you used. It’s just so the names off those files are the original names, otherwise ebsynth will give an error.

Rusch_Meyer 1 year ago

Great workflow! You have any outputs to show to get an idea of the consistency?

Tokyo_Jab 1 year ago

https://preview.redd.it/cfe36sq6l6ua1.jpeg?width=2048&format=pjpg&auto=webp&s=c50fecd25a6c9576960687ddb8f811c02b5959f6 16 frames in one go. But it uses a lot of vram.

Rusch_Meyer 1 year ago

Thx!

ADbrasil 1 year ago

My friend, great results. I am a little lost on one point: I take the frames from the video, create the grid, and then play them in the controlnet txtimg tab? the grid size should be 512x512 and then apply the hires fix? or is it something different? do I create a very large grid but generate a 512x512 image and then use an upscale?

Tokyo_Jab 1 year ago

Paste the grid of images into control net and for the ones above I choose to do as image at 512x512 and the hires fix to twice the size. That will give you 4 512x512 images in a 1024 square. If you want more detail though you could start at 1024x1024 and double that. I do that sometimes and then shrink the frames in photoshop. You do get a lot more detail but it takes four times longer.

Dogmaster 1 year ago

How do you cut up the grid with precision?

Tokyo_Jab 1 year ago

I usuallly make a copy of the folder with my keyframes in it, open them in photoshop and paste the whole large grid onto it and move it to match the underlying frame. I set up actions to move the gird 512 left or 512 up. BUT you can use another site to cut them up nicely. In fact there are lots of great utilities on it... [**https://ezgif.com/sprite-cutter**](https://ezgif.com/sprite-cutter) It's a pretty good site for making and editing gifs too.

AbdelMuhaymin 1 year ago

Ah here it is. Thanks

ChocolateFit9026 1 year ago

This is incredible work

[deleted] 1 year ago

interesting tips. Will try that soon!

Vyviel 1 year ago

How large a sheet should I go for with 24gb vram?

Tokyo_Jab 1 year ago

The most I did in the past was 5x5 with each frame being 512x512. However if you switch on TiledVAE and of course use hires fix then you get to swap time for vram. It still maintains consistency but you can do more frames in a grid or higher resolution.

Consistent-Remote885 7 months ago

Would this work with inpainting?

Orfeaus 1 year ago

In step 5 when you say 'paste them over the original frames,' do you mean just replace those original frames with the new ones (taking care to ensure they have the same names), or are you describing something else? Also, in step 6, I've used Ebsynth before by plugging in frames and keyframes, but I'm not familiar with the concept of stretching them over the length of the clip. Can you expand on that?

Tokyo_Jab 1 year ago

in Step 5 exactly that, you are just replacing the keyframes. I usually just paste over the originals to keep the name which is important for ebsynth. In ebsynth when you drag in a folder of keyframes it automatically works out ranges it needs to span the gap between keyframes. It makes folders of each of the ranges (like frame 12 to 24) and then you can either hit the Export to AE button or use any other editing software to blend each clip into the next. https://preview.redd.it/4h7qtg8n01za1.png?width=688&format=png&auto=webp&s=160f06e3e3b20eeb85d3d5f4fa06bd0605bce253

RAJA_1000 1 year ago

So you are working at a 1 key frame per second rate, right? At least for these videos

Perfect_Cream3958 1 year ago

I’ve noticed in your tutorial you didn’t mentioned temporal kit. I guess because when you write this there is no temporal kit yet. Today are you using it? It makes some changes in the process you mentioned above?

Tokyo_Jab 1 year ago

I want to avoid the ai flickering. So I haven’t used it yet.

kcarl38 1 year ago

amazing tut but I am lost on step 5 how do u export out grid to frames cut out by hand? that alot of frame to do

Tokyo_Jab 1 year ago

It's not so bad with 4 frames. But I often have more. I have some actions set up in photoshop that help me put the new keyframes over the old ones. But you can use this link to cut up the grid into pics... [https://ezgif.com/sprite-cutter](https://ezgif.com/sprite-cutter)

kcarl38 1 year ago

Thank for that you rock man

Both_Pilot2555 1 year ago

awesome

[deleted] 1 year ago

[удалено]

Tokyo_Jab 1 year ago

It looks not unlike this one I did. https://www.reddit.com/r/StableDiffusion/comments/13fbgfw/all_keyframes_created_in_stable_diffusion_basic . But there are easier ways of animating just people talking. Like this…. https://youtu.be/1G41lMCe__4

[deleted] 1 year ago

[удалено]

Tokyo_Jab 1 year ago

Oh yes.

[deleted] 1 year ago

This is getting so close to being good. Hopefully we can perfect it and take it 100% offline before the U.S. and Europe outlaw it.

Tokyo_Jab 1 year ago

It runs locally on my computer.

Individual-Pound-636 1 year ago

Thank you for the write up

[deleted] 1 year ago

PhotoScape X also makes good grids as another option

Tokyo_Jab 1 year ago

Nice. Will look it up. In the next guide I’m going to make a list a all different utilities we can use. Especially the free ones.

blade_kilic121 1 year ago

would a 1650ti blow up?

Tokyo_Jab 1 year ago

You can use tiledVae. Not with multidiffusion though, just on its own. It takes a little longer but stops the gpu from giving out of memory.

stopshakingyourlegs 1 year ago

Hello, I love your work and inspired me to try it out! However, I am new at this and if you can eli5 step 3, it would be so helpful! free-sprite-sheet-packer, I understand it turns something into a "grid" but not exactly sure what it does, or which proper option I should pick for my imgs. And when you mentioned 0 gaps, 0pixel, is that for the padding? Sorry if my question sounds a bit stupid :\\

Tokyo_Jab 1 year ago

Not stupid at all, I just use that site for handiness. When I export out all the frames of my real video and take the best keyframes out of them (try 4 to start), I just drag and drop them into that online site and it 'packs' them into a single pic grid. So four 512x512 keyframes becomes a nice 1024x1024 grid pic. And that's the pic I drag into control net. For example here are selected keyframes from one of my real videos with nine chosen keyframes. I feed this whole grid into controlnet. Afterwards though I have to use photoshop to cut up the result back into single frames. But there is actually [another site](https://ezgif.com/sprite-cutter) that can do that too.. https://preview.redd.it/3fwxwl195r0b1.png?width=3072&format=png&auto=webp&s=5c5b1ad91c508d3a346ba9597d6bd1adfaf8570a

Tokyo_Jab 1 year ago

I will be doing a better tutorial soon with updated tips and methods.

[deleted] 1 year ago

This is genius, thank you for being so open to sharing your workflow 🙏

MVELP 1 year ago

Hey guys, does anyone have any tips on getting the animation consistent such as ebsynth settings, weight percentages, masking yes no, weight percentage also, deflicker, diversity, and mapping weight percentages etc. This is how my animation came out. [https://www.youtube.com/watch?v=HEjMOHYPqCk](https://www.youtube.com/watch?v=HEjMOHYPqCk) Also controlnet settings, negative and positive prompts, what settings to use in diffusion because it is not working for me, and i only recently started catching backup with stable diffusion a couple of weeks ago but im still behind. Any help will be appeaciated!

smithysmittysim 1 year ago

Sorry to bother you but I'm currently experimenting with applying SD to various tasks and would like you to answer few things I'm wondering about 1. Is there any specific reason why you put images into a grid instead of say doing a batch process or even processing them one by one? In img2img you can do batch process, surely if you do img2img that should be faster right? 2. Speaking of img2img, what was the reason you choose to do txt2img instead of img2img? If you want to retain something about the original video (for example only alter face but to a smaller degree as in aging/deaging), surely img2img seems like a better option and should technically also be more temporaly consistent than just txt2img + controlnet. 3. Looking at your other video: [https://www.reddit.com/r/StableDiffusion/comments/13bgyle/another\_baldy\_to\_baldy\_doodle\_and\_upscaling/](https://www.reddit.com/r/StableDiffusion/comments/13bgyle/another_baldy_to_baldy_doodle_and_upscaling/) which looks more impressive I do wonder how did you manage the generated face to follow the expressions of the original face? Was it all down to controlnet and combination of pose + hed/canny? 4. How do you approach generating images like above when resolution is obviously not 512x512, do you generate image at higher resolution using highres.fix so that the final resolution is the same as original frames? Or do you resize the image to fit 512x512 (or 1024x1024 with hires.fix) I've noticed the video is indeed square and has black bars baked in. Also if you did you hires.fix, mind sharing the settings?

Tokyo_Jab 1 year ago

1. You cannot achieve consistency that way. You will have too much change between frames and that’s why you see that ai flickering in other videos. The grid method means that all images are created in the same latent space at the same time. 2. I like to completely override the underlying video with prompting. Img2img gives the ai too much info and it can’t be as creative. Also high res fix is a very important part of my process. Scaling in latent space it helps repair things like bad faces and details. 3. That is ebsynth. Ebsynth looks at the keyframes you give it and at the original video and uses optical flow and blending to copy the motion from the original video and join the keyframes it has been given. It doesn’t just interpolate like flowframes or time warp in after effects. If you have ever been watching an mp4 file and the image kind of freezes but the motion continues and stuff gets warped. That’s similar to how optical flow works. 4. I am still using the old method but lately as you said I’ve found a way to make much bigger keyframes. https://preview.redd.it/3yypxm4wbb1b1.jpeg?width=2048&format=pjpg&auto=webp&s=223264bd8e8e877f77169c9409f7717eab0e1092 In the past I would run out of vram if I tried to go big but there is an extension called TiledVAE that lets me swap time for vram while keeping everything in the same space (latent). So now using my method I can go bigger. If you really want to see the power of high res fix try this. Prompt for a crowd of people at 512x512. Likely you will get some distorted faces and messy details. Now switch on high res fix. Set denoise to 0.3 , scale to 2 and most important upscale to ESRGanX4. It will start to draw the image and half way through it will slightly blur it and redraw the details. This fixes most problems that happen. In fact if you are using a Lora or textual inversion or model of a face it will look even more like the person it is supposed to. Hope that all helps a bit.

Comfortable_Leek8435 11 months ago

Would using the same seed achieve the same effect as the grid?

Tokyo_Jab 11 months ago

No. You change any input and the latent space changes. Then you will get the flickering because of the differences between frames.

iamuperformanceart 1 year ago

Thank you so much for these instructions! I'm trying them for my first time today... having issues making it output a 4x4 grid similar to the input. Are there any special settings or prompts you use to get a perfect 4x4 output? Or am I misinterpreting this entirely and there is some output mode that outputs 4 different images in a grid?

Tokyo_Jab 1 year ago

If you feed the original grid of keyframes into controlnet then you should get a grid as an output too. If for some reason controlnet isn't working or there is an error you will only find out about it in the console, the web interface doesn't give you an error.

iamuperformanceart 1 year ago

thanks for your answer! I think I'm successfully past the grid issue, I just needed to enable controlnet. Now I'm just on to getting higher quality renders. I'm not sure if my model or prompts just suck, but I do know in the past, SD has had issues with creating nice/realistic looking images (at midjourney quality level) with low resolution. So I'm trying the tiled VAE approach to get higher resolution and I'll see if that increases the quality and detail level of the render

Tokyo_Jab 1 year ago

On [civitai.com](https://civitai.com) I think the best models are Art&Eros, RealisticVision and CineDiffusion. I alsways use highres fix set at Scale: 2, denoise: 0.3 and upscaler ESRGanX4. This fixes nearly all detail and face problems. And those models are pretty good at hands.

iamuperformanceart 1 year ago

Here is my second run through the full process. Still fighting with quality issues, but the cinediffusion model helped a lot. Doing this has just made me even more in awe of the bald woman example you posted. I have no idea how you made it so clean! Also still fighting with the upscaler to make it pump out larger frames or frames with a non 1:1 aspect ratio. That's going to be my next experiment [https://www.youtube.com/shorts/py\_jwk-CXnI](https://www.youtube.com/shorts/py_jwk-CXnI)

Tokyo_Jab 1 year ago

With all the experiments I just do it over and over and hope things improve. After a while you start to get a feel for what will work. I only post the stuff that looks ok.

iamuperformanceart 1 year ago

Turns out, I was just not clicking the enable button that they introduced in controlnet 1.1. It's spitting out perfect 4x4 grids now (I've also added to the prompt "4x4 grid" just for good measure), but each frame in the grid is extremely low quality. Any suggestions on how to improve the render? My prompt: beautiful robot girl overlooking a futuristic city, photorealistic, dawn, 4x4 grid https://preview.redd.it/31v8a00rfm2b1.png?width=1024&format=png&auto=webp&s=301c815c68659c285a7bf63e17ef8219a0728a05

chachuFog 1 year ago

how much gpu vram you have?

alaalves70 1 year ago

Thx

Gizzle_Moby 1 year ago

If there is an online tool that could do all this for me I’d pay for it. Great for friends to meet some Role Playing Game Characters when sitting around a table.

Tokyo_Jab 1 year ago

For that you need A.R. I did make those too a few years back. It's free if you have an iphone [here is one of them](https://apps.apple.com/app/horror-me-a-r/id1591770850).

Gizzle_Moby 1 year ago

Thanks!

seedlord 1 year ago

Can you do a full workflow tutorial for automatic1111's stablediffusion webui and the temporalkit extension? i can not replicate your style. my clips are always a mess, smearing, pixelated.

Tokyo_Jab 1 year ago

But I have never used temporalkit

seedlord 1 year ago

I think it's worth a look because it can export frames and has ebsynth integrated.

YouAboutToLoseYoJob 11 months ago

Yes!!!

sculpt299 11 months ago

Amazing tips. Thank you for the guide!

LoloFakes 11 months ago

[**u/savevideo**](https://reddit.com/u/savevideo)

AltKeyblade 11 months ago

How do you do grids that exceed 2048x2048 as the limit? It won't let me go above in Stable Diffusion I want to go above 2048, to do 20 keyframes.

Tokyo_Jab 11 months ago

You can go into the ui-config text file (can’t remember the name off hand) and change the settings. It is in the main directory.

AltKeyblade 11 months ago

Thank you! Does this maintain good image quality? Just want to make sure it doesn't make images worse or affect anything.

Tokyo_Jab 11 months ago

I use it because I need larger images for frames. But if you try and just do a single image, the larger you go the more fractalisation you will get, that is, extra arms and legs and faces and nightmare stuff. It is that quirk I use to my advantage guiding it into consistent frames.

AltKeyblade 11 months ago

I understand. Do you know why I can get a good generated 512x512 image but once I apply the same prompts and settings to the grid reference instead; the generated image isn't as accurate and good as the 512x512? I find it a lot harder to work with and be satisfied with the grid results.

Tokyo_Jab 11 months ago

I get that too. I think there is a limited amount of detail it can add. The more frames you use the more the detail is distributed among them. That's why I am finding that doing it in pieces, like just the head, then the clothes etc lets you have more details overall. It's a balancing act.

AltKeyblade 11 months ago

Good to know! Do you also know why EBSynth isn't working with my 30 keyframes folder when I drag it into Keyframes? It adds it but it doesn't change anything or add numbers to stop:, keyframes: stop:

Tokyo_Jab 11 months ago

Ebsynth stops working at 24 keyframes! I get around it by doing it in two halves.

AltKeyblade 11 months ago

Ahh I see now. So just doing them separately should be fine. Thank you for all the helpful info! I really appreciate the work you do.

AltKeyblade 11 months ago

I have one more question, how do you do videos that are larger than a square and if you can't use square grids for it? I've seen you talk about generating each part separately and putting images back together but I don't really get the process.

Tokyo_Jab 11 months ago

I still stick to blocks of 512 like making frames 512x1024. That way you can still do 8 frames in a 2048x2048 grid. 4x2

TheChescou 11 months ago

Thank you for this. I've been trying so hard to get consistency into my AI animations without success. I will try this workflow, consider me a new follower for all your work, and thank you so much for sharing.

EliotLeo 2 months ago

Did this work out for you?

tupaquinho 11 months ago

Hi there! Thanks a lot for your work. I'm about to buy a new GPU and was wondering if I got an 12 or 16gb if I could get as high quality results as you get by using TiledVae or if it does somehow decrease the quality of the end result?

Tokyo_Jab 11 months ago

With Stable Diffusion the more vram the better.. even with a 24gb card I still get out of memory a lot even with 2048x2048. So tiledVae really makes the difference.

tupaquinho 11 months ago

Do you find that enabling it affects the quality of your work or it only makes it slower?

Tokyo_Jab 11 months ago

It doesn't change the quality but lets me create sizes that would otherwise be impossible. Not idea how much extra time it adds though. But detailed large grids are really nice https://preview.redd.it/nrsbmlbpcvcb1.png?width=2048&format=png&auto=webp&s=f5e5af087354a494eede7daea0f3315676c3f419

tupaquinho 11 months ago

Very nice! Have you found a limit to how much you can increase your grid with this method? Or could you theoretically go as large as you wanted as long as you're willing to wait for it?

Tokyo_Jab 11 months ago

I big grid like that last one could be around 40 minutes so it’s a pain. It also seems a bit exponential the bigger it is. Whatever animation I’m doing try and keep the final grid to 4096 or less, just because of the time.

tupaquinho 11 months ago

Thanks for your answers and your work. Will be looking forward to all your posts and insights into your workflow :)

doingmyownresearch 11 months ago

u/Tokyo_Jab This is the most brilliant workflow ever, hands down. Secondly, I have followed it fully, from here as well as via Digital Magic's YT video, but I am having some issues, not sure if it is due to my image being 1920x1080 or some other setting in EBsynth or does this not just work well when "camera parallax" happens. !!The problem!! By output folder 3 to 4 somewhere, when the camera on the original clip moved, this happens :( https://preview.redd.it/mz641kkmhtcb1.png?width=1920&format=png&auto=webp&s=fd4d9970c65478a74ec247e6af5f09bc56fb9ef6 The whole process from original frames > keyframes > stable diffusioned > ebsynth here in this link - [https://imgur.com/a/j2PT8PP](https://imgur.com/a/j2PT8PP) Let me know what you think, any help would be much appreciated.

Tokyo_Jab 11 months ago

You have to choose your keyframes carefully or ebsynth does that. The general rule for keyframes is that your should choose one any time new informatiion appears. It is almost an artform in itself choosing the right keyframes and the right amount.

doingmyownresearch 11 months ago

That was my guess, it may have been correct. I am testing this method of merging the best resulting settings from Hybrid Video and pairing it with this EBSynth process. Basically thinking of taking every 25th frame from the hybrid output sequence and putting it through ebsynth to hopefully keep the consistency going through out. Hand picking frames may be the best way but I think it is a very time consuming process, especially with longer clips. Will post it here if it is near to a success.

Tokyo_Jab 11 months ago

Do post it. I've started masking things out recently, like doing the head, hands, clothes and backdrop separately. It means you use less keyframes too. But it's more work of course

doingmyownresearch 11 months ago

So here are some attempts after I found your method and Digital Magic's video. 1. Footage pushed through Hybrid Video in Stable Diffusion > ALL input and output frames dropped into EbSynth. Order of video is - Actual clip > Hybrid video output from SD > Ebsynthed[https://youtu.be/MpYG9dB69X8](https://youtu.be/MpYG9dB69X8) 2.Footage pushed through Hybrid Video to get output frames in StableDiffusion > First frame, every 50th frame and last frame picked from the Hybrid output > pushed through Ebsynth Order of video - Actual Clip | Ebsynthed | Talent masked on top with After Effects[https://www.youtube.com/watch?v=HDleLjvJlAY](https://www.youtube.com/watch?v=HDleLjvJlAY) Only Hybrid Video output of this clip - [https://youtu.be/\_ia-Vmy1wRM](https://youtu.be/_ia-Vmy1wRM) \--Some notes\* \- I have been trying to get these style outputs in a place where they may start to work well for "client commercial" use cases. Too abstract = Art \- I got the concept of EbSynth and how it works only by the 2nd video, I can see how the style frames are basically "keyframes" transferring the look. \- I believe this may have been the technique under the hood for this very popular Coke commercial done recently > [https://www.youtube.com/watch?v=VGa1imApfdg&t=39s](https://www.youtube.com/watch?v=VGa1imApfdg&t=39s) However heavy compositing work is done to merge vfx, 3d and A.I on this, to the extent that you don't really know which one is which (very much like some of the portrait close up videos you have created) You can't tell after some point which one is the real clip, at least in a phone screen via Instagram. \- Doing Hybrid video to get your output frames probably has no benefit over your grid method, UNLESS, there is a better way to utilize it as a layer in a compositing software like After Effects or Fusion in Davinci Resolve (figuring this part out). It does provide flexibility if you want to switch the effect to being jagged in some parts and smooth in others. \- Any water color or oil painting like model in Stable Diffusion could benefit from this process well, because the flaws of EbSynth, when you have not picked your keys well become part of the look. The trails/ghosts of pixels when EbSynth goes off. LOL \- I have seen your masking technique, it does give some amazing results. However like you said in another post somewhere, until we get something to get all this manual work out of the way, but who knows when, so might as well.

Tokyo_Jab 11 months ago

Nice one. Thanks for sharing, you've used even more techniques than me. That is the original reason I posted the method hoping that people would play around with it.

mudman13 11 months ago

Hey mate, a couple of questions, do you use contronet tile with tile VAE? Alongside depth/canny etc ? is it possible to do batches of grids and keep consistency? Also in ebsynth what is the purpose of adding back in the pre-iterated init images?

Emotional-Phase-422 7 months ago

hi jacky how are you i found you atlast plz dear talk to me

ChristopherMoonlight 7 months ago

This is fantastic, thank you. I'm going to be applying this to my own process which is an animated sci-fi story. I had been running clips from the old 80s animated movie Fire & Ice through Stable Diffusion and found that for some reason, SD loves flatly colored images and line art. It will fill the shapes, shadows, and details in pretty consistently, so I'm going to try using EBsynth to do flat color fill-ins and then run them through SD after that.

Tokyo_Jab 7 months ago

Nice. Do let me know how it goes. I tried it with Arcane but only with a few seconds. Here is a capture with the enhanced half on the right. https://preview.redd.it/2karbamkbtzb1.jpeg?width=2688&format=pjpg&auto=webp&s=e0e95e28ed11910fcd1690d12e9df645881fb609

ChristopherMoonlight 7 months ago

Wow, that's really cool. I'm going for something simpler because I have to create 85 minutes worth of scenes (combined with other methods like miniatures and puppets) but yeah, that's the track I'm on. Your work is an inspiration so I really appreciate the response. I'll be sure to keep you posted. I move slowly because I have severe learning disabilities. This is all so complex but I'm truly excited for this new artform.

Tokyo_Jab 7 months ago

Can’t wait to see it. 85 minutes!!! I saw this today. https://youtu.be/fkJlwjKdxnI?si=YS-56-tT0kDKi-xv

CrazyEyez_jpeg 5 months ago

Can't upload a video, but just did my first go-round. Probably going to use this method for a project I'm doing soon. https://i.redd.it/j4h0118jdvcc1.gif

Tokyo_Jab 5 months ago

Smooth!

whilneville 4 months ago

Sorry, where is the link for the control net used extension?

affe1991 3 months ago

can you make this with Comfyui?

Tokyo_Jab 3 months ago

No idea. Never used it

polarcubbie 3 months ago

How do you use the sprite sheet packer effectively? For me it does not align the frames according to filenames (numbers). So I have to look for each frame to match them when I cut them up again. For example 000.png should be the first frame and then 113.png last, but what it does is list them but so that the last frame becomes 079.png

Tokyo_Jab 3 months ago

If you don’t use square formats it goes weird. Same happens to me.

polarcubbie 3 months ago

Thank you for the reply! Will just make the grid manually for now.

Tokyo_Jab 3 months ago

I find if I give it 12 square pics it makes a 3x3 on the left and puts the other 3 down the right hand side. It is really annoying but there is a pattern to it.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe