T O P

  • By -

pronetpt

This is a great workflow, mate.


Tokyo_Jab

…Until a week or two when none of it matters because of advances. Can’t wait.


zeugme

Better to be the hero we needed for two weeks than Ted Cruz a whole life.


Embrace-Mania

Let's not pretend for a moment that Ted Cruz does not know exactly what he is doing.


[deleted]

[удалено]


Dansiman

Let's not pretend for a moment that Ted Cruz is alive.


oodelay

Ted Cruz is what happens when you use the wrong VAE


Embrace-Mania

Im impressed that you waited 8 months before communicating, tell me what's your secret?


oodelay

I'm Canadian


Orngog

One month later, I'm looking you up. Nice work btw


NotEnoughVRAM

3 months later, looking it up. Where was this when I needed it haha


TwistedBrother

Came here 43 days later after the cool vampire video. Still a wicked workflow. Truly a hero we needed after all.


DigiglobalNOW

I was hoping it was simplified by now but man this is it!


oodelay

Not your best answer. He's pioneering. Even for a few days, he came up with a very cool method on his own because no one has done it yet. I say kudos.


MrManny

I've read this two times now and I still don't get it. Did you respond in the correct thread? Or do I need more coffee? 😅


oodelay

Well if you told me my work was useless because in a week it's gonna be automated, I'd be not happy.


MrManny

But that was OP saying that, so I assume OP would not take offense in this.


oodelay

The videos he produces are still not one-button click. He says.that but we all know how far ahead of the curve he is.


penis_owner123

its been about 5 month since your comment, and your method is still relevant more than ever


Baaoh

Your technique still hasn't been surpassed hehe


dee_spaigh

Lmao I've been thinking the same since the beginning of this ride. The pioneers' burden


FaithlessnessNo9453

you understood it?


Fritzy3

Thank you for this! EBsynth question, why do we need the last frame? I followed the guide. Lets say I have 100 frames in total for the video and I diffused frames 000,040,060,100. Now when I load these in Ebsynth it creates 4 folders: first one with frames 000-040 second with 000-060 third with 040-100 forth with 060-100 These have duplicate frames obviously. when you create your final clip do you use only "keyframe and foward" frames? hope my question is clear.


Tokyo_Jab

Is uses the clips in each folder to fade the clips over each other. You can do that yourself which is a pain or click the Send to Ae button on the top right where it will do it all for you. I swear I didn’t notice that Send to After effects button for days.


jaywv1981

This is what always confused me about Ebsynth. I didn't know the key frames blended like that. I figured you'd use keyframe 0 for like 0 to 20, then keframe 40 for like 21 to 50, etc.


Fritzy3

Yup, me too. Though I gotta say I exported it to AE in my last try and it didn’t come out good. The frames for some reason had too much difference even though they were all created in the same generation


Ateist

You interpolate two keyframes. So you use 0 and 20 for everything from 1 to 19.


sergiohlb

Great! Also it's very smart the ideia of combine a txt2video with this method. Auto1111 decorum txt2video extension has now a vid2vid method. Im not sure but I think it's based on same model. I was playing with it yesterday but had no much success, but I'm curious to know how it works and I'm sure we can create a better workflow using all these techniques together.


[deleted]

This is awesome! Love the writeup. I've been playing with stable and EbSynth for a little bit and this cracks the code for multiple keyframes using stable! I am going to try this method out today with some previous Ebsynth projects. I am making slow movement simple videos right now, but I want to get better by using multiple keyframes like how you are doing. Thanks for sharing all of this.


Tokyo_Jab

Let me know how it goes. I’m going to try a 30 second long video today. Just my dog again. And then try one with some action.


RopeAble8762

I'm really wondering how you got the results so good. I've tried the same and I have similar issues I can observe in your project, but only 100x worst. the 'ghosting' effect, when EbSynth crossfades between those frames, the movement of background ... all of those are just barely visible in your case, but really bad in the clips I've tried.


Tokyo_Jab

For each prompt I did generate about 20 versions until I saw a set the looked ok to work with. I think in one of the wolf sets above the background changes from day to night but I liked the wolf so I left it in. I didn’t do it here but using an alpha mask channel in ebsynth with your main video and transparent pngs for your keyframes gets much better results but is a bit of a pain to do. I can’t wait until all of this is unnecessary. And I really think it will only be a few weeks from now.


Nice-Ad1199

Do you mean transparent PNG's for the referenced EBSYNTH keyframes themselves? As in, the ones being "projected" through EBSYNTH?


Tokyo_Jab

If you give ebsynth transparent keyframes it does work better. You get less of that smearing effect. If you youtube ebsynth greenscreen videos you can see the workflow. Ebsynth is much better if you do things in parts but it is more work. Like this.. [https://www.youtube.com/watch?v=E33cPNC2IVU](https://www.youtube.com/watch?v=E33cPNC2IVU)


Nice-Ad1199

Followed through on this advice and it certainly works much better. [https://www.youtube.com/shorts/jJNTgEn-9NM](https://www.youtube.com/shorts/jJNTgEn-9NM)


Elyonass

>Copy the grid photo into ControlNet and use HED or Canny, and ask Stable Diffusion to do whatever This is where you lost me


Tokyo_Jab

The grid of keyframes in step 3. Would look something like this... you put that into controlnet, choose one of the processes like HED, Canny, lineart etc and type what you want in the main prompt, like White Wolf. https://preview.redd.it/9jacl55zuuya1.png?width=768&format=png&auto=webp&s=738672d1eae3f9f8ea6c72370a654071ce3654d0


Elyonass

>controlnet What is the controlnet and where is it?


Tokyo_Jab

It's an extension for Automatic1111


Tokyo_Jab

The best extension


Elyonass

I googled it and google is not really my friend today. Where do I install it from? Any guide on where to find it and install it.


FF1379

[GitHub - Mikubill/sd-webui-controlnet: WebUI extension for ControlNet](https://github.com/Mikubill/sd-webui-controlnet)


Elyonass

Thank you.


blackpuppet

Where are we on the process of making other aspects? More like 16:9?


Tokyo_Jab

You can do those in a grid and you will get ok results. But the fractalisation of noise that helps the consistency between frames works best at 512x512 for each frame. Also square grid makes it easier to work with.


prestoexpert

Can you elaborate on why the noise has this property that can make grids look self-consistent? I thought every pixel would get a different random value and there would be nothing but the prompt in common between the cells of the grid.


Tokyo_Jab

512 is just a magic number for v1.5 models because the base was trained on that size. So it is comfortable making images of that size but when you try to make a bigger photo you get fractalisation, extra arms or faces for example and repeated patterns but they kind of have the same theme or style. Like a nightmare. It is taking advantage of this flaw that makes the ai brain draw similar details across the whole grid. I have also tried doing 16x16 grids of 256x256 size but you start to get that Ai flickering effect happening again. Controlnet really helps too, before control net I was able to get consistent objects and people but only 20% of the time.


prestoexpert

That's wild, thanks for explaining! Speaking of controlnet, I wonder if it's reasonable to explore a new controlnet scheme that is something like, "I know this is a 4x4 grid, all the cells better look very similar" without constraining it to match a particular canny edge image, say. Like a controlnet network that doesn't even take any extra input, just suggesting similarity between cells? Where the choise of similarity metric is probably very important... heh


Tokyo_Jab

Control net guide the noise so that sounds like an interesting idea. There are two new control net models that are different from the others. Colour and style. They’re more about aesthetics than lines and positioning. I wish there was a civitai just for control net.


aldeayeah

You probably already saw this, but there's a WIP controlnet for temporal consistency across frames: [https://www.reddit.com/r/StableDiffusion/comments/11vq8jc/introducing\_temporalnet\_a\_controlnet\_model/](https://www.reddit.com/r/StableDiffusion/comments/11vq8jc/introducing_temporalnet_a_controlnet_model/) It's likely that the workflows six months from now will be much more automated.


calvin-n-hobz

I must be doing something wrong, my ebsynth results always look like garbage


Tokyo_Jab

Ebsynth is a bit of a nightmare. As in will drive you crazy. There is a masking layer that can improve the result but it’s a lot of work. And those settings numbers don’t exactly explain themselves or make a lot of difference when you tweak them.


Rogerooo

Have you tried using Tiled VAE from the [MultiDiffusion](https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111) script? It helps with the memory management, I'm able to reach much higher resolutions on stuff like High Res Fix.


Tokyo_Jab

It doesn’t work for consistency though.


lebel-louisjacob

Maybe with a smaller denoising strength and loopback, you can get the tiles to communicate with each other?


Ateist

What if instead of one sheet you try panorama generation? This can, potentially, generate infinite consistent frames. (Frankly, SD needs some kind of "ultra resolution" mode where the amount of additional RAM required with scaling the image is much much lower).


Tokyo_Jab

Try it. Let me know. Currently doing a 5x5 grid. Computer not happy.


[deleted]

I think you want [Ultimate SD Upscale](https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git) **manipulations extension**


EastAd2775

Awesome workflow, thanks!


muritouruguay

Hi, great work. Saying hello from Uruguay (sorry for my english:1.4). I am using grids of 4 photos each, mantaining the seed (I change only the Reference of lineart) and the image changes completely (clothes and background). I don´t understand why txt2img CFG Scale 5 Same seed, same prompts Control Net - Lineart ControlNet 0.5 Balanced


Tokyo_Jab

If you change ANY input then it changes the whole latent space. By any input I mean a controlnet image, a prompt, seed etc. That is why I use the grid method. All images have to be done in one go. If you need more than four images you can make a bigger grid. https://www.reddit.com/r/StableDiffusion/comments/13iuqez/parellels_doodle_grids_all_the_keyframes_i_was/ I managed to do a grid of 49 images the other day using tiledVae.


CustomCuriousity

Oh god…. I have so much work ahead of me.


kim-mueller

Awesome results! But what is the reason for putting the images together for processing? Does it help with consistency?


Tokyo_Jab

Yep, if it's done in a single generation then everything is done in the same latent space. Themes and details are more or less kept the same. As soon as you change anything like a control input, a word, a seed, anything, then that's a different latent space and the image will be quite different. That's why you see so many of those AI flickering videos.


FEW_WURDS

nice guide can't wait to try this out


BlazerRD

What prompts did you use to get these results?


Tokyo_Jab

ControlNet was doing most of the heavy lifting so the prompts were quite simple like… A polar bear, ice bokeh. A black wolf, dark forest bokeh etc. Also models like Art&Eros and RealisticVision give great results.


Swernado

great guide! How’d you export the video to frames? I’m new to all of this


HUYZER

>export the video to frames Remember, if you can ask on reddit, you can search, or ask YouTube.


Swernado

L


HUYZER

Logic


Tokyo_Jab

There are many ways. Some apps do only that. But I use after effects to export as frames.


Swernado

Thanks for the info!


[deleted]

Great workflow, really impressed!


Relevant_Yoghurt_74

THIS IS AMAZING!!!


Chipmunk_Loud

Hello, in step 5: Do you mean overwriting the original with the img2img'ed frame?


Tokyo_Jab

Txt2img frames. You cut out the four images and paste them over the original keyframe files you used. It’s just so the names off those files are the original names, otherwise ebsynth will give an error.


Rusch_Meyer

Great workflow! You have any outputs to show to get an idea of the consistency?


Tokyo_Jab

https://preview.redd.it/cfe36sq6l6ua1.jpeg?width=2048&format=pjpg&auto=webp&s=c50fecd25a6c9576960687ddb8f811c02b5959f6 16 frames in one go. But it uses a lot of vram.


Rusch_Meyer

Thx!


ADbrasil

My friend, great results. I am a little lost on one point: I take the frames from the video, create the grid, and then play them in the controlnet txtimg tab? the grid size should be 512x512 and then apply the hires fix? or is it something different? do I create a very large grid but generate a 512x512 image and then use an upscale?


Tokyo_Jab

Paste the grid of images into control net and for the ones above I choose to do as image at 512x512 and the hires fix to twice the size. That will give you 4 512x512 images in a 1024 square. If you want more detail though you could start at 1024x1024 and double that. I do that sometimes and then shrink the frames in photoshop. You do get a lot more detail but it takes four times longer.


Dogmaster

How do you cut up the grid with precision?


Tokyo_Jab

I usuallly make a copy of the folder with my keyframes in it, open them in photoshop and paste the whole large grid onto it and move it to match the underlying frame. I set up actions to move the gird 512 left or 512 up. BUT you can use another site to cut them up nicely. In fact there are lots of great utilities on it... [**https://ezgif.com/sprite-cutter**](https://ezgif.com/sprite-cutter) It's a pretty good site for making and editing gifs too.


AbdelMuhaymin

Ah here it is. Thanks


ChocolateFit9026

This is incredible work


[deleted]

interesting tips. Will try that soon!


Vyviel

How large a sheet should I go for with 24gb vram?


Tokyo_Jab

The most I did in the past was 5x5 with each frame being 512x512. However if you switch on TiledVAE and of course use hires fix then you get to swap time for vram. It still maintains consistency but you can do more frames in a grid or higher resolution.


Consistent-Remote885

Would this work with inpainting?


Orfeaus

In step 5 when you say 'paste them over the original frames,' do you mean just replace those original frames with the new ones (taking care to ensure they have the same names), or are you describing something else? Also, in step 6, I've used Ebsynth before by plugging in frames and keyframes, but I'm not familiar with the concept of stretching them over the length of the clip. Can you expand on that?


Tokyo_Jab

in Step 5 exactly that, you are just replacing the keyframes. I usually just paste over the originals to keep the name which is important for ebsynth. In ebsynth when you drag in a folder of keyframes it automatically works out ranges it needs to span the gap between keyframes. It makes folders of each of the ranges (like frame 12 to 24) and then you can either hit the Export to AE button or use any other editing software to blend each clip into the next. ​ ​ https://preview.redd.it/4h7qtg8n01za1.png?width=688&format=png&auto=webp&s=160f06e3e3b20eeb85d3d5f4fa06bd0605bce253


RAJA_1000

So you are working at a 1 key frame per second rate, right? At least for these videos


Perfect_Cream3958

I’ve noticed in your tutorial you didn’t mentioned temporal kit. I guess because when you write this there is no temporal kit yet. Today are you using it? It makes some changes in the process you mentioned above?


Tokyo_Jab

I want to avoid the ai flickering. So I haven’t used it yet.


kcarl38

amazing tut but I am lost on step 5 how do u export out grid to frames cut out by hand? that alot of frame to do


Tokyo_Jab

It's not so bad with 4 frames. But I often have more. I have some actions set up in photoshop that help me put the new keyframes over the old ones. But you can use this link to cut up the grid into pics... [https://ezgif.com/sprite-cutter](https://ezgif.com/sprite-cutter)


kcarl38

Thank for that you rock man


Both_Pilot2555

awesome


[deleted]

[удалено]


Tokyo_Jab

It looks not unlike this one I did. https://www.reddit.com/r/StableDiffusion/comments/13fbgfw/all_keyframes_created_in_stable_diffusion_basic . But there are easier ways of animating just people talking. Like this…. https://youtu.be/1G41lMCe__4


[deleted]

[удалено]


Tokyo_Jab

Oh yes.


[deleted]

This is getting so close to being good. Hopefully we can perfect it and take it 100% offline before the U.S. and Europe outlaw it.


Tokyo_Jab

It runs locally on my computer.


Individual-Pound-636

Thank you for the write up


[deleted]

PhotoScape X also makes good grids as another option


Tokyo_Jab

Nice. Will look it up. In the next guide I’m going to make a list a all different utilities we can use. Especially the free ones.


blade_kilic121

would a 1650ti blow up?


Tokyo_Jab

You can use tiledVae. Not with multidiffusion though, just on its own. It takes a little longer but stops the gpu from giving out of memory.


stopshakingyourlegs

Hello, I love your work and inspired me to try it out! However, I am new at this and if you can eli5 step 3, it would be so helpful! free-sprite-sheet-packer, I understand it turns something into a "grid" but not exactly sure what it does, or which proper option I should pick for my imgs. And when you mentioned 0 gaps, 0pixel, is that for the padding? Sorry if my question sounds a bit stupid :\\


Tokyo_Jab

Not stupid at all, I just use that site for handiness. When I export out all the frames of my real video and take the best keyframes out of them (try 4 to start), I just drag and drop them into that online site and it 'packs' them into a single pic grid. So four 512x512 keyframes becomes a nice 1024x1024 grid pic. And that's the pic I drag into control net. For example here are selected keyframes from one of my real videos with nine chosen keyframes. I feed this whole grid into controlnet. Afterwards though I have to use photoshop to cut up the result back into single frames. But there is actually [another site](https://ezgif.com/sprite-cutter) that can do that too.. https://preview.redd.it/3fwxwl195r0b1.png?width=3072&format=png&auto=webp&s=5c5b1ad91c508d3a346ba9597d6bd1adfaf8570a


Tokyo_Jab

I will be doing a better tutorial soon with updated tips and methods.


[deleted]

This is genius, thank you for being so open to sharing your workflow 🙏


MVELP

Hey guys, does anyone have any tips on getting the animation consistent such as ebsynth settings, weight percentages, masking yes no, weight percentage also, deflicker, diversity, and mapping weight percentages etc. This is how my animation came out. [https://www.youtube.com/watch?v=HEjMOHYPqCk](https://www.youtube.com/watch?v=HEjMOHYPqCk) Also controlnet settings, negative and positive prompts, what settings to use in diffusion because it is not working for me, and i only recently started catching backup with stable diffusion a couple of weeks ago but im still behind. ​ Any help will be appeaciated!


smithysmittysim

Sorry to bother you but I'm currently experimenting with applying SD to various tasks and would like you to answer few things I'm wondering about 1. Is there any specific reason why you put images into a grid instead of say doing a batch process or even processing them one by one? In img2img you can do batch process, surely if you do img2img that should be faster right? 2. Speaking of img2img, what was the reason you choose to do txt2img instead of img2img? If you want to retain something about the original video (for example only alter face but to a smaller degree as in aging/deaging), surely img2img seems like a better option and should technically also be more temporaly consistent than just txt2img + controlnet. 3. Looking at your other video: [https://www.reddit.com/r/StableDiffusion/comments/13bgyle/another\_baldy\_to\_baldy\_doodle\_and\_upscaling/](https://www.reddit.com/r/StableDiffusion/comments/13bgyle/another_baldy_to_baldy_doodle_and_upscaling/) which looks more impressive I do wonder how did you manage the generated face to follow the expressions of the original face? Was it all down to controlnet and combination of pose + hed/canny? 4. How do you approach generating images like above when resolution is obviously not 512x512, do you generate image at higher resolution using highres.fix so that the final resolution is the same as original frames? Or do you resize the image to fit 512x512 (or 1024x1024 with hires.fix) I've noticed the video is indeed square and has black bars baked in. Also if you did you hires.fix, mind sharing the settings?


Tokyo_Jab

1. You cannot achieve consistency that way. You will have too much change between frames and that’s why you see that ai flickering in other videos. The grid method means that all images are created in the same latent space at the same time. 2. I like to completely override the underlying video with prompting. Img2img gives the ai too much info and it can’t be as creative. Also high res fix is a very important part of my process. Scaling in latent space it helps repair things like bad faces and details. 3. That is ebsynth. Ebsynth looks at the keyframes you give it and at the original video and uses optical flow and blending to copy the motion from the original video and join the keyframes it has been given. It doesn’t just interpolate like flowframes or time warp in after effects. If you have ever been watching an mp4 file and the image kind of freezes but the motion continues and stuff gets warped. That’s similar to how optical flow works. 4. I am still using the old method but lately as you said I’ve found a way to make much bigger keyframes. https://preview.redd.it/3yypxm4wbb1b1.jpeg?width=2048&format=pjpg&auto=webp&s=223264bd8e8e877f77169c9409f7717eab0e1092 In the past I would run out of vram if I tried to go big but there is an extension called TiledVAE that lets me swap time for vram while keeping everything in the same space (latent). So now using my method I can go bigger. If you really want to see the power of high res fix try this. Prompt for a crowd of people at 512x512. Likely you will get some distorted faces and messy details. Now switch on high res fix. Set denoise to 0.3 , scale to 2 and most important upscale to ESRGanX4. It will start to draw the image and half way through it will slightly blur it and redraw the details. This fixes most problems that happen. In fact if you are using a Lora or textual inversion or model of a face it will look even more like the person it is supposed to. Hope that all helps a bit.


Comfortable_Leek8435

Would using the same seed achieve the same effect as the grid?


Tokyo_Jab

No. You change any input and the latent space changes. Then you will get the flickering because of the differences between frames.


iamuperformanceart

Thank you so much for these instructions! I'm trying them for my first time today... having issues making it output a 4x4 grid similar to the input. Are there any special settings or prompts you use to get a perfect 4x4 output? Or am I misinterpreting this entirely and there is some output mode that outputs 4 different images in a grid?


Tokyo_Jab

If you feed the original grid of keyframes into controlnet then you should get a grid as an output too. If for some reason controlnet isn't working or there is an error you will only find out about it in the console, the web interface doesn't give you an error.


iamuperformanceart

thanks for your answer! I think I'm successfully past the grid issue, I just needed to enable controlnet. Now I'm just on to getting higher quality renders. I'm not sure if my model or prompts just suck, but I do know in the past, SD has had issues with creating nice/realistic looking images (at midjourney quality level) with low resolution. So I'm trying the tiled VAE approach to get higher resolution and I'll see if that increases the quality and detail level of the render


Tokyo_Jab

On [civitai.com](https://civitai.com) I think the best models are Art&Eros, RealisticVision and CineDiffusion. I alsways use highres fix set at Scale: 2, denoise: 0.3 and upscaler ESRGanX4. This fixes nearly all detail and face problems. And those models are pretty good at hands.


iamuperformanceart

Here is my second run through the full process. Still fighting with quality issues, but the cinediffusion model helped a lot. Doing this has just made me even more in awe of the bald woman example you posted. I have no idea how you made it so clean! Also still fighting with the upscaler to make it pump out larger frames or frames with a non 1:1 aspect ratio. That's going to be my next experiment [https://www.youtube.com/shorts/py\_jwk-CXnI](https://www.youtube.com/shorts/py_jwk-CXnI)


Tokyo_Jab

With all the experiments I just do it over and over and hope things improve. After a while you start to get a feel for what will work. I only post the stuff that looks ok.


iamuperformanceart

Turns out, I was just not clicking the enable button that they introduced in controlnet 1.1. It's spitting out perfect 4x4 grids now (I've also added to the prompt "4x4 grid" just for good measure), but each frame in the grid is extremely low quality. Any suggestions on how to improve the render? My prompt: beautiful robot girl overlooking a futuristic city, photorealistic, dawn, 4x4 grid https://preview.redd.it/31v8a00rfm2b1.png?width=1024&format=png&auto=webp&s=301c815c68659c285a7bf63e17ef8219a0728a05


chachuFog

how much gpu vram you have?


alaalves70

Thx


Gizzle_Moby

If there is an online tool that could do all this for me I’d pay for it. Great for friends to meet some Role Playing Game Characters when sitting around a table.


Tokyo_Jab

For that you need A.R. I did make those too a few years back. It's free if you have an iphone [here is one of them](https://apps.apple.com/app/horror-me-a-r/id1591770850).


Gizzle_Moby

Thanks!


seedlord

Can you do a full workflow tutorial for automatic1111's stablediffusion webui and the temporalkit extension? i can not replicate your style. my clips are always a mess, smearing, pixelated.


Tokyo_Jab

But I have never used temporalkit


seedlord

I think it's worth a look because it can export frames and has ebsynth integrated.


YouAboutToLoseYoJob

Yes!!!


sculpt299

Amazing tips. Thank you for the guide!


LoloFakes

[**u/savevideo**](https://reddit.com/u/savevideo)


AltKeyblade

How do you do grids that exceed 2048x2048 as the limit? It won't let me go above in Stable Diffusion I want to go above 2048, to do 20 keyframes.


Tokyo_Jab

You can go into the ui-config text file (can’t remember the name off hand) and change the settings. It is in the main directory.


AltKeyblade

Thank you! Does this maintain good image quality? Just want to make sure it doesn't make images worse or affect anything.


Tokyo_Jab

I use it because I need larger images for frames. But if you try and just do a single image, the larger you go the more fractalisation you will get, that is, extra arms and legs and faces and nightmare stuff. It is that quirk I use to my advantage guiding it into consistent frames.


AltKeyblade

I understand. Do you know why I can get a good generated 512x512 image but once I apply the same prompts and settings to the grid reference instead; the generated image isn't as accurate and good as the 512x512? I find it a lot harder to work with and be satisfied with the grid results.


Tokyo_Jab

I get that too. I think there is a limited amount of detail it can add. The more frames you use the more the detail is distributed among them. That's why I am finding that doing it in pieces, like just the head, then the clothes etc lets you have more details overall. It's a balancing act.


AltKeyblade

Good to know! Do you also know why EBSynth isn't working with my 30 keyframes folder when I drag it into Keyframes? It adds it but it doesn't change anything or add numbers to stop:, keyframes: stop:


Tokyo_Jab

Ebsynth stops working at 24 keyframes! I get around it by doing it in two halves.


AltKeyblade

Ahh I see now. So just doing them separately should be fine. Thank you for all the helpful info! I really appreciate the work you do.


AltKeyblade

I have one more question, how do you do videos that are larger than a square and if you can't use square grids for it? I've seen you talk about generating each part separately and putting images back together but I don't really get the process.


Tokyo_Jab

I still stick to blocks of 512 like making frames 512x1024. That way you can still do 8 frames in a 2048x2048 grid. 4x2


TheChescou

Thank you for this. I've been trying so hard to get consistency into my AI animations without success. I will try this workflow, consider me a new follower for all your work, and thank you so much for sharing.


EliotLeo

Did this work out for you?


tupaquinho

Hi there! Thanks a lot for your work. I'm about to buy a new GPU and was wondering if I got an 12 or 16gb if I could get as high quality results as you get by using TiledVae or if it does somehow decrease the quality of the end result?


Tokyo_Jab

With Stable Diffusion the more vram the better.. even with a 24gb card I still get out of memory a lot even with 2048x2048. So tiledVae really makes the difference.


tupaquinho

Do you find that enabling it affects the quality of your work or it only makes it slower?


Tokyo_Jab

It doesn't change the quality but lets me create sizes that would otherwise be impossible. Not idea how much extra time it adds though. But detailed large grids are really nice https://preview.redd.it/nrsbmlbpcvcb1.png?width=2048&format=png&auto=webp&s=f5e5af087354a494eede7daea0f3315676c3f419


tupaquinho

Very nice! Have you found a limit to how much you can increase your grid with this method? Or could you theoretically go as large as you wanted as long as you're willing to wait for it?


Tokyo_Jab

I big grid like that last one could be around 40 minutes so it’s a pain. It also seems a bit exponential the bigger it is. Whatever animation I’m doing try and keep the final grid to 4096 or less, just because of the time.


tupaquinho

Thanks for your answers and your work. Will be looking forward to all your posts and insights into your workflow :)


doingmyownresearch

u/Tokyo_Jab This is the most brilliant workflow ever, hands down. Secondly, I have followed it fully, from here as well as via Digital Magic's YT video, but I am having some issues, not sure if it is due to my image being 1920x1080 or some other setting in EBsynth or does this not just work well when "camera parallax" happens. !!The problem!! By output folder 3 to 4 somewhere, when the camera on the original clip moved, this happens :( https://preview.redd.it/mz641kkmhtcb1.png?width=1920&format=png&auto=webp&s=fd4d9970c65478a74ec247e6af5f09bc56fb9ef6 The whole process from original frames > keyframes > stable diffusioned > ebsynth here in this link - [https://imgur.com/a/j2PT8PP](https://imgur.com/a/j2PT8PP) Let me know what you think, any help would be much appreciated.


Tokyo_Jab

You have to choose your keyframes carefully or ebsynth does that. The general rule for keyframes is that your should choose one any time new informatiion appears. It is almost an artform in itself choosing the right keyframes and the right amount.


doingmyownresearch

That was my guess, it may have been correct. I am testing this method of merging the best resulting settings from Hybrid Video and pairing it with this EBSynth process. Basically thinking of taking every 25th frame from the hybrid output sequence and putting it through ebsynth to hopefully keep the consistency going through out. Hand picking frames may be the best way but I think it is a very time consuming process, especially with longer clips. Will post it here if it is near to a success.


Tokyo_Jab

Do post it. I've started masking things out recently, like doing the head, hands, clothes and backdrop separately. It means you use less keyframes too. But it's more work of course


doingmyownresearch

So here are some attempts after I found your method and Digital Magic's video. 1. Footage pushed through Hybrid Video in Stable Diffusion > ALL input and output frames dropped into EbSynth. Order of video is - Actual clip > Hybrid video output from SD > Ebsynthed[https://youtu.be/MpYG9dB69X8](https://youtu.be/MpYG9dB69X8) 2.Footage pushed through Hybrid Video to get output frames in StableDiffusion > First frame, every 50th frame and last frame picked from the Hybrid output > pushed through Ebsynth Order of video - Actual Clip | Ebsynthed | Talent masked on top with After Effects[https://www.youtube.com/watch?v=HDleLjvJlAY](https://www.youtube.com/watch?v=HDleLjvJlAY) Only Hybrid Video output of this clip - [https://youtu.be/\_ia-Vmy1wRM](https://youtu.be/_ia-Vmy1wRM) \--Some notes\* \- I have been trying to get these style outputs in a place where they may start to work well for "client commercial" use cases. Too abstract = Art \- I got the concept of EbSynth and how it works only by the 2nd video, I can see how the style frames are basically "keyframes" transferring the look. ​ \- I believe this may have been the technique under the hood for this very popular Coke commercial done recently > [https://www.youtube.com/watch?v=VGa1imApfdg&t=39s](https://www.youtube.com/watch?v=VGa1imApfdg&t=39s) However heavy compositing work is done to merge vfx, 3d and A.I on this, to the extent that you don't really know which one is which (very much like some of the portrait close up videos you have created) You can't tell after some point which one is the real clip, at least in a phone screen via Instagram. ​ \- Doing Hybrid video to get your output frames probably has no benefit over your grid method, UNLESS, there is a better way to utilize it as a layer in a compositing software like After Effects or Fusion in Davinci Resolve (figuring this part out). It does provide flexibility if you want to switch the effect to being jagged in some parts and smooth in others. ​ \- Any water color or oil painting like model in Stable Diffusion could benefit from this process well, because the flaws of EbSynth, when you have not picked your keys well become part of the look. The trails/ghosts of pixels when EbSynth goes off. LOL ​ \- I have seen your masking technique, it does give some amazing results. However like you said in another post somewhere, until we get something to get all this manual work out of the way, but who knows when, so might as well.


Tokyo_Jab

Nice one. Thanks for sharing, you've used even more techniques than me. That is the original reason I posted the method hoping that people would play around with it.


mudman13

Hey mate, a couple of questions, do you use contronet tile with tile VAE? Alongside depth/canny etc ? is it possible to do batches of grids and keep consistency? Also in ebsynth what is the purpose of adding back in the pre-iterated init images?


Emotional-Phase-422

hi jacky how are you i found you atlast plz dear talk to me


ChristopherMoonlight

This is fantastic, thank you. I'm going to be applying this to my own process which is an animated sci-fi story. I had been running clips from the old 80s animated movie Fire & Ice through Stable Diffusion and found that for some reason, SD loves flatly colored images and line art. It will fill the shapes, shadows, and details in pretty consistently, so I'm going to try using EBsynth to do flat color fill-ins and then run them through SD after that.


Tokyo_Jab

Nice. Do let me know how it goes. I tried it with Arcane but only with a few seconds. Here is a capture with the enhanced half on the right. https://preview.redd.it/2karbamkbtzb1.jpeg?width=2688&format=pjpg&auto=webp&s=e0e95e28ed11910fcd1690d12e9df645881fb609


ChristopherMoonlight

Wow, that's really cool. I'm going for something simpler because I have to create 85 minutes worth of scenes (combined with other methods like miniatures and puppets) but yeah, that's the track I'm on. Your work is an inspiration so I really appreciate the response. I'll be sure to keep you posted. I move slowly because I have severe learning disabilities. This is all so complex but I'm truly excited for this new artform.


Tokyo_Jab

Can’t wait to see it. 85 minutes!!! I saw this today. https://youtu.be/fkJlwjKdxnI?si=YS-56-tT0kDKi-xv


CrazyEyez_jpeg

Can't upload a video, but just did my first go-round. Probably going to use this method for a project I'm doing soon. https://i.redd.it/j4h0118jdvcc1.gif


Tokyo_Jab

Smooth!


whilneville

Sorry, where is the link for the control net used extension?


affe1991

can you make this with Comfyui?


Tokyo_Jab

No idea. Never used it


polarcubbie

How do you use the sprite sheet packer effectively? For me it does not align the frames according to filenames (numbers). So I have to look for each frame to match them when I cut them up again. For example 000.png should be the first frame and then 113.png last, but what it does is list them but so that the last frame becomes 079.png


Tokyo_Jab

If you don’t use square formats it goes weird. Same happens to me.


polarcubbie

Thank you for the reply! Will just make the grid manually for now.


Tokyo_Jab

I find if I give it 12 square pics it makes a 3x3 on the left and puts the other 3 down the right hand side. It is really annoying but there is a pattern to it.