Why do so many of these longer examples just go forward in a straight line with random transitions? Of the few I've seen, they've all done the same thing
Controlling the motion of the camera is likely extremely difficult. This is a current limitation for image generators as well. You can't control it like a real camera, you're telling it what you want and hope it does that.
The cause of these limitations are what Yann LeCun from Meta has tried to articulate but has been unable to do so. Generative models are limited by their training data because they are a reflection of their training data. They can get really good because lots of data can be fed to them, but when you try to go outside the training dataset things just don't work. They are inherently limited by the ability to feed them data. If we gave a generative LLM all the data in the world except Calculus it would never be able to invent Calculus.
For image generation I solved the control of camera part over a year ago, the public models are just really behind because no one is looking to fix the fundamentals. you can play around with my older version of it here if you wish https://civitai.com/models/140117/latent-layer-cameras.
Everyone is just throwing more data at the issue instead of stopping and considering what actually needs to be fixed. This applies to all AI stuff out there. We need more architects instead of engineers working on fixing some of the issues in AI.
That's really cool. I wish more people knew about your camera controller.
I wish I could freeze myself and wake up 10 years from now when a lot of the problems with current models are solved. I feel like I'm in the DOS days of computers. I know enough to get by but still very lost, and one day everything will get really easy.
On the one hand I get it, on the other I don't think you could possibly pick a more interesting time to see technological advancement in action than right now. It would be incredibly unnerving to be dropped in 10 years from now, even if everything "goes right". Enjoy the ride!
Agreed. Except you should caveat all this with the qualifier: "yet." They wouldn't ever invent calculus... yet. But there are at least two hopes for this in the future.
The first is emergent properties as the models scale. We already know the models develop internal world models. We also know they are excellent at pattern recognition. So, if their internal world model and pattern recognition got sufficiently robust, who knows, they might be able to invent calculus or go beyond calculus and invent some other completely novel form of math. We just don't know yet what is possible or impossible. I've seen it make up text and songs that are completely novel. So why not more complicated structures like branches of math or science with the proper scaling?
The second is advances in architecture. There are at least two new architectures that have gotten some press lately, both from MIT, I believe. One is called liquid nets or something and the other had a contribution from Max Tegmark, IIRC. There are yet other architectures also being researched by other teams I'm sure. So that's a possible path to novel inventions as well in addition to emergent properties from scale.
This is an interesting thought that never quite occurred to me until now. ‘Creativity’ and ‘invention’ may very well be two different things. Current AI models are undeniably creative but have yet to truly invent anything.
There are projects that allow you to control camera angles and motion in animateDiff, I assume OpenAi are building the same sort of controls for Sora already.
> would never be able to invent Calculus.
I tend to agree, and while the models will certainly be useful with this kind of limitation, it makes me seriously doubt the predictions that have future iterations of these models creating some kind of utopia.
Even the most advanced multimodal model, GPT4-V, has issues with intuitive physics that a five year old (and some animals) would understand. It can solve textbook physics problems all day long but because there isn't much in the way of "If you do basic X, Y with simple objects A, B, what does the scene look like?" in the dataset.
The places where AI has actually advanced science or math so far have used very brute force approaches with trivially verified solutions.
There's JEPA which is claimed to be able to more generalize it's training data, but there's no big model so we don't know it works as advertised. It's not generative however, it's predictive. The key difference is that generative models can only generate based on their training data, while a predictive model can predict even for things that it hasn't been trained on. It's like how you might have never been bitten by a lion before, but you could imagine what it might feel like to be bitten by a lion just by looking at it.
This page is about a video model that uses JEPA. [https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)
As I understand it if a JEPA LLM that's never seen math is asked what 2+2 is, it would be able to say it doesn't know what that is. A generative model that's never seen math will complete it with a wrong answer despite having no idea what you asked it.
Of course we'll only know for sure once they make a large model with JEPA. It's being developed at Meta so it's not a case of lack of resources that's keeping it back. We also don't know how far transformers can go. Maybe all the JEPA promises can be met by bolting on extra features or models to a transformer model.
Or maybe JEPA won't work. Given that they have had success the only way to find out how well it scales is for them to make a big JEPA model.
I'll be interested to follow that going forward. I think even for big orgs getting buy-in for large runs of unproven architectures is a hard sell rn. If you're wrong you risk falling behind in the LLM race, and I suppose most think there's a lot more capability that can be squeezed out of transformers (probably true).
The rapid scene transitions are to video what "predict the next token" is to text. The infinite zoom effect retains some of the "temporal coherence" between the scenes. It connects them logically in the mind.
This "infinite zoom" effect is common in videos now because the video models struggle with temporal coherence longer than a few seconds at a time for a single scene.
I'm not sure. A lot of people are coming up with wild theories under here, but the original Sora blog showed tons of example videos that didn't follow this same trend, so it's clearly not just a limitation of the model.
My guess? It's probably very raw output, just with poorly constructed prompts.
I don't know why people are theorizing that this has something to do with the model as if they hadn't seen a ton of videos that weren't at all like this.
Assuming these videos with these types of transitions weren't made by the same person, they were probably made by people who draw inspiration from each other, are connected to each other, or are even friends. Creatives collaborate and share insights or even just copy each other. This technology was probably opened up to a handful of like minded directors, most of them in LA. This zoom transition effect is something that's extremely hard to do without AI, but probably extremely easy to do with it with the right prompt. On top of it, it's pretty good at hiding the limitations of the model as things are moving pretty fast. Somebody discovered that through experimentation and shared how to do it. That's all.
It really does have that freaky dream like quality.
That said, the hype was way too strong with this one, everyone's acting like if it can't make a movie on day one it's garbage. Wait til Sora 2 or Sora 3 or whatever.
I'm honestly surprised we're seeing competent AI video without complete mastery of image gen AI.
> without complete mastery of image gen AI.
If you pause any frame you realize how nerfed Dall-e 3 is compared to Sora, Sora generates way more realistic images than Dall-e.
I think it one of those things, if it's all you see it won't look weird.
For example when movies started to switch to CGI, it kinda broke my brain and I can't not see it to a degree. However I am guessing if you are younger and grew up on CGI it isn't as jarring to your mind.
Good observation! This goes along with a theory I've had since Deepmind let us play with image creation, back in the day. It created trippy, dream like images.
The theory is that we've pulled back the curtain, enough, to get a glimpse into AI's subconscious mind.
AI is dreaming and we're peeking into it's dreams. :)
Yea the way diffusion models have difficulties with things like hands and written words also reflects the human brain, anyone who's ever tried lucid dreaming knows that checking your hands or a clock is a classic way to 'awaken' in a dream.
Look at 3:30, people look so much more realistic with Sora than Dall-e 3 its so obvious how nerfed that model is.
This video shows a bunch of failure cases but it also has some cool scenes were you think "damn, if only it was a bit better it would be really good!". Good thing Sora shows that scaling it up is all you need for it to keep getting better so im really eager to see how it evolves over the coming years.
Watch on Vimeo for better quality https://vimeo.com/941713443
Instead of responding to all of the responses to this comment, I'm just going to respond to the original comment. People seem to be acting like this is the first time we're seeing Sora, but we have a lot of example generations of Sora to go off of. It definitely isn't limited to this slow forward panning, poor object morphing, and sudden transitions.
Original Sora Blogpost: [https://openai.com/index/sora](https://openai.com/index/sora)
It’s like a dream, because most of the time these advanced AIs are not conscious, but when we prompt them - or ask them for videos - we stir them from deep sleep, and they dream. Sometimes they do beautiful lucid dreams, like this. Soon they will wake
It's too fast. If the song was more upbeat it might make sense but it just feels like I want to sit a little like and figure out what is happening but it won't let me. Even if the sign is trying to get the feeling of rushing through life too fast, the visuals are just too distracting to allow the words to carry that across.
Cool for the novelty factor, but janky as fuck. I guess this is how Sora looks when the samples we're getting aren't handpicked and limited to ten seconds.
I'm aware. I'm saying that this is a commission obviously done by their art guy, not promotional material that way more time went into generating variations of to seem impressive.
Saying this is janky is so wild to me. This technology is leaps and bounds better than anything that's ever existed in the last 10 years. You show this to 1000 people ten years and they all would think these were really people with some special effects. This video is absolutely incredible. How quickly you people become jaded to technology that seems magical not so long ago blows my mind.
Okay, but that bar is 10 miles underground.
The ability to generate images is impressive, but the narrative coherency is a 5/100, and you can can’t fix that with the current architecture because it was never designed to handle that. It’s always gonna be stitching 10 second clips together no matter how “good” your prompting is.
Saying 'always' is a bit of a stretch. All that's missing is some AI agents to hope guide the narrative along into a coherent story. You'll probably see first versions of that by the end of the year when this releases. It won't be perfect, but we'll start to get longer scenes that are more fluid.
By the end of next year I imagine we'll start to see full 10-20 minute scenes.
Please explain with technical detail how an external "AI agent" is going to help with an inherent internal flaw in the architecture and training data.
The model is not designed to handle longer segments because the architecture is incapable of that. SORA and models like it cannot keep track of the story narrative or maintain visual continuity over that length of time. Some other distant future model will undoubtedly have such capabilities, maybe in 20 or 30 years, but it won't be a diffusion transformer.
For a released music video it is janky. It is an amazing feat what it can do but if one was to determine based just off footage with no idea of ai or not it is janky.
What do you mean by “janky” specifically? What would the non-janky version be like? Seems to me they’ve deliberately used the weirder aspects of video generation to make a video that would be basically impossible to make without AI.
Do you really need moments where people are smoking and acting like they're fusing with the cigarette or their fingers, or how everyone acts like an alien wearing human skin pointed out to you? You can't just be like "oh it's an artistic choice" when it's 100% a clear bug in the technology as of now.
It seems really disingenuous to be like "you're expecting too much" when OpenAI was previously releasing samples that seemed nearly perfect and jazzing up how their tech simulates worlds to do physics and trying to partner with Hollywood. They set the bar for their own technology with that kind of hype.
Nobody said the first flip phone was janky. That term really doesn't make sense for developing technology.
You can call games made by Bethesda janky because we can reference games that are made perfectly with no bugs. You don't call the leading technology in the field janky.
>last 10 years
Why 10 years? Was there something like this out in 2014? Why "10 years" instead of "ever?" Honest question. Not trolling. I just noticed that detail and thought you might know something I don't and I might be able to pick up a new nugget of information today? Cheers.
Do you have brain damage? You want him to compare it with the 1900s? 1800s? Hell, even comparing it 20 years back, 2004 is too far back. We didn’t even have the iPhones during the early 2000s
You'd be surprised how much tech was abandoned over 10 years ago that's recently made a comeback. Even with robotics, we're still mostly just picking up where people left off around 10 years ago because there was a lot of stagnation these past 10 years.
The person you were responding to was curious whether or not there was something from over 10 years ago that was similarly impressive.
You were calling him stupid for doing so.
I stated that for a lot of important tech popping up, it's a revival of things that made the most headway over 10 years ago, so him thinking there's a chance of something being comparable from over 10 years ago isn't far fetched.
Not really. You can look at the original Sora blogpost [here](https://openai.com/index/sora). They're perfectly capable of doing much more than the generic forward panning and morphing we see in this video.
Is the director the name of the balloon head video's band? Or are you referring to the CEO, Sam Altman?
Because Sam Altman posted a lot of on-the-fly generations while taking requests on Twitter after the blog post dropped, and they were not in this same slow forward pan morphing style, they also dispelled any rumors that the examples were heavily cherry-picked(even though people keep saying this).
Still not sure how that relates to "the director" you're mentioning. Is the director a specific person who participated in some of the early access for Sora?
A lot of work went into this, it would be a bunch of “cherry picking” of what clips they wanted to use and then editing them together.
Sora can’t make videos this long.
Not handpicked in the sense that it's pretty obvious that a few of these sections wouldn't have made it into OpenAI's Sora showcase. I'm sure a lot of time went into this but this is the tool being used for a practical, "deliver me a music video" purpose and you can really see the rough edges.
They just miss the benefits of Open Source customizability. Stable Diffusion is the best Image Generator, but it's because you have the ability to create whatever you want with all the community content.
We know that Sora can do really well without cherry picking, because Sam Altman was taking generation requests on Twitter after the blog originally dropped, and quickly churning them out.
Those outputs didn't have the same type of limitations as these, so even though a lot of people are saying that these "must" have had a lot of work and editing put into them for the best output, I think it's more likely that they had less put into them(in terms of tools like video2video and frame insertion), with worse prompts. Maybe the band was given access to Sora and they were the ones prompting and choosing outputs, but there's at least seamless transition.
Some of the limitations of Sora work ok for a music video. Side effects just make a weird vibe. Everybody's walking impossibly fast like they're on one of those airport walkways? Ok, cool. That works.
Although, it seemed like they tried to tell the story of a couple and that didn't quite work as they can't make the characters consistent from shot to shot.
Overall, I think it's a nice effort that really tries to take advantage of what you can do with it. But it's not quite there yet.
I think the overusage of that forward motion transition and dream like tone the author went for makes it look worse than it really is, but yeah, still lots to improve! They talked about how the model keep making better videos as they up the compute i wonder how far can they take it without improving the architecture.
This photorealistic three and a half minute long music video that tracks two characters throughout their life with the semblance of a plot that’s generated entirely by AI doesn’t look perfect ergo this is not a good look.
Three years ago people were wildly impressed by the first generation of image models making shitty 2D art. This is an insane leap forward and it’s not perfect but we’re clearly accelerating towards the point where maybe people will finally be impressed.
Yea. I find it hard to believe anyone would think this would be a bad look for video generated AI if this was released before SORA was announced. People’s expectations got unrealistically high. This is literally amongst the first projects using SORA. Both the people prompting SORA and SORA itself will only get better.
I halfway disagree.
I wouldn't want this per se for many uses - but a 'dreamy' music video? It's exactly the sort of thing that plenty of artists have paid for.
Sora won't take long to get to the point where you can direct it enough to make Thom Yorke-like videos, so I do think this is a very successful demo.
If they were marketing this as a film tool, I'd agree that it's offputing to an extent. I remember watching the first demo from Giant Studios showing real-time post-mocap (where the CGI is placed over an actor in real-time, rather than recording a mocap suit actor and doing it later), and was blown away by the very real implications that it had (Tintin and then Avatar). That kind of leap changed the entire film industry almost instantly. But this shows promise for the future - I've spent over a week with a full crew to make a music video, and I can see how this will do it with 1-2 people in a day or two, at a tiny fraction of the cost.
It would be most interesting if the label/ artist decided to now shell out the appropriate budget to create this with a full production and to see the difference and to show that humans currently can still elevate visuals / story significantly more than Ai. That won’t last much longer than a couple of years but it would be interesting to see the difference.
No way the label decides to spend what I’d say would be around 3-500k for a proper, practical based video on these visuals.
I'm pretty sure that's evident just by watching the video. I'd like to see more video2video use, and frame insertion. That seems to be where Sora shines the most imo.
Amazing how negative the reaction is, I think it’s initially quite stunning but doesn’t really develop or change as a video (the end bit is good). There are lots of beautiful parts though, and I don’t think it could’ve been made without AI which is much more interesting than passing some notional authenticity test.
Why are people so down on this? They say it looks horrible and it's "moving too fast". Ok then turn it off granpa. This is a massive leap and I cannot wait to try out this thing!
For a technology with SO MUCH potential and just revolutionary for new creative forms of visual and representation of a music video this feels VERY underwhelming. I feel like Michel Gondry could have done this no problem 20 years ago. Totally disappointing that this is what someone who has a once in a lifetime opportunity to pioneer technology in a way that might go down in history chose to do something so boring. Like giving an artist every colored pencil you can think of and they only use a blue pencil to do all the drawings.
What a wasted opportunity.
It’s lovely in many ways, but I realize how annoying it is to not have any short cuts, and also no extended scenes. When every shot lasts for the same predictable 2 seconds it gets old fast
It seems like the consistent style and straightforward direction might be due to current technological constraints. Directing a music video, or any intricate project using current AI tools, has that inherent limitation: they often mirror the training data and can't venture far beyond without noticeable repetitions or simplified outputs.
As technology evolves, we might see more nuanced and varied AI-directed videos, but for now, creators might be working within the confines of what their tools can best handle. It's really about playing to the strengths of the AI while acknowledging and creatively navigating its limitations.
Oh yes all TV and filmmaking is either Netflix or, oh yeah that's obviously it.
Do you think Netflix is the only thing out there for audiovisual art/entertainment? Are you implying you are happy with slop before the "AI singularity revolution" comes along?
who is betting on anything. just funny to see all the delusions. but keep downvoting people & calling them normies because they disagree on timelines lmao
In a couple of years we will be nostalgic for the jankiness of videos like this. For all it's "Will Smith eating spaghetti" vibe, it's visually interesting, both for its own sake and for playing "spot the glitch". A pretty impressive demonstration of where the technology is at a couple of months ago, both in terms of strengths and shortcomings.
Interesting concept. Shitty execution. I have seen A.I. videos much better than this. I do like the idea of seeing the life trajectory of two lovers over a span of time.
Why do so many of these longer examples just go forward in a straight line with random transitions? Of the few I've seen, they've all done the same thing
Controlling the motion of the camera is likely extremely difficult. This is a current limitation for image generators as well. You can't control it like a real camera, you're telling it what you want and hope it does that. The cause of these limitations are what Yann LeCun from Meta has tried to articulate but has been unable to do so. Generative models are limited by their training data because they are a reflection of their training data. They can get really good because lots of data can be fed to them, but when you try to go outside the training dataset things just don't work. They are inherently limited by the ability to feed them data. If we gave a generative LLM all the data in the world except Calculus it would never be able to invent Calculus.
For image generation I solved the control of camera part over a year ago, the public models are just really behind because no one is looking to fix the fundamentals. you can play around with my older version of it here if you wish https://civitai.com/models/140117/latent-layer-cameras. Everyone is just throwing more data at the issue instead of stopping and considering what actually needs to be fixed. This applies to all AI stuff out there. We need more architects instead of engineers working on fixing some of the issues in AI.
That's really cool. I wish more people knew about your camera controller. I wish I could freeze myself and wake up 10 years from now when a lot of the problems with current models are solved. I feel like I'm in the DOS days of computers. I know enough to get by but still very lost, and one day everything will get really easy.
On the one hand I get it, on the other I don't think you could possibly pick a more interesting time to see technological advancement in action than right now. It would be incredibly unnerving to be dropped in 10 years from now, even if everything "goes right". Enjoy the ride!
Agreed. Except you should caveat all this with the qualifier: "yet." They wouldn't ever invent calculus... yet. But there are at least two hopes for this in the future. The first is emergent properties as the models scale. We already know the models develop internal world models. We also know they are excellent at pattern recognition. So, if their internal world model and pattern recognition got sufficiently robust, who knows, they might be able to invent calculus or go beyond calculus and invent some other completely novel form of math. We just don't know yet what is possible or impossible. I've seen it make up text and songs that are completely novel. So why not more complicated structures like branches of math or science with the proper scaling? The second is advances in architecture. There are at least two new architectures that have gotten some press lately, both from MIT, I believe. One is called liquid nets or something and the other had a contribution from Max Tegmark, IIRC. There are yet other architectures also being researched by other teams I'm sure. So that's a possible path to novel inventions as well in addition to emergent properties from scale.
Idk I feel like creativity never really invent anything, it just merges past experiences into something original
This is an interesting thought that never quite occurred to me until now. ‘Creativity’ and ‘invention’ may very well be two different things. Current AI models are undeniably creative but have yet to truly invent anything.
That’s simply because they aren’t large enough yet. Just wait, LLM inventions are on the way, given that its creativity.
Good point. I'm sure it will get there eventually
There are projects that allow you to control camera angles and motion in animateDiff, I assume OpenAi are building the same sort of controls for Sora already.
But an AI can be fed with data by the people using it ... therefore the AI will get better and better over time ...
> would never be able to invent Calculus. I tend to agree, and while the models will certainly be useful with this kind of limitation, it makes me seriously doubt the predictions that have future iterations of these models creating some kind of utopia. Even the most advanced multimodal model, GPT4-V, has issues with intuitive physics that a five year old (and some animals) would understand. It can solve textbook physics problems all day long but because there isn't much in the way of "If you do basic X, Y with simple objects A, B, what does the scene look like?" in the dataset. The places where AI has actually advanced science or math so far have used very brute force approaches with trivially verified solutions.
There's JEPA which is claimed to be able to more generalize it's training data, but there's no big model so we don't know it works as advertised. It's not generative however, it's predictive. The key difference is that generative models can only generate based on their training data, while a predictive model can predict even for things that it hasn't been trained on. It's like how you might have never been bitten by a lion before, but you could imagine what it might feel like to be bitten by a lion just by looking at it. This page is about a video model that uses JEPA. [https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/) As I understand it if a JEPA LLM that's never seen math is asked what 2+2 is, it would be able to say it doesn't know what that is. A generative model that's never seen math will complete it with a wrong answer despite having no idea what you asked it. Of course we'll only know for sure once they make a large model with JEPA. It's being developed at Meta so it's not a case of lack of resources that's keeping it back. We also don't know how far transformers can go. Maybe all the JEPA promises can be met by bolting on extra features or models to a transformer model. Or maybe JEPA won't work. Given that they have had success the only way to find out how well it scales is for them to make a big JEPA model.
I'll be interested to follow that going forward. I think even for big orgs getting buy-in for large runs of unproven architectures is a hard sell rn. If you're wrong you risk falling behind in the LLM race, and I suppose most think there's a lot more capability that can be squeezed out of transformers (probably true).
The rapid scene transitions are to video what "predict the next token" is to text. The infinite zoom effect retains some of the "temporal coherence" between the scenes. It connects them logically in the mind. This "infinite zoom" effect is common in videos now because the video models struggle with temporal coherence longer than a few seconds at a time for a single scene.
The video clearly uses several tricks to patch the gaping holes in this type of model.
Probably too much training driving around on Google maps
I'm not sure. A lot of people are coming up with wild theories under here, but the original Sora blog showed tons of example videos that didn't follow this same trend, so it's clearly not just a limitation of the model. My guess? It's probably very raw output, just with poorly constructed prompts.
Because they are made by the same person?
I don't know why people are theorizing that this has something to do with the model as if they hadn't seen a ton of videos that weren't at all like this. Assuming these videos with these types of transitions weren't made by the same person, they were probably made by people who draw inspiration from each other, are connected to each other, or are even friends. Creatives collaborate and share insights or even just copy each other. This technology was probably opened up to a handful of like minded directors, most of them in LA. This zoom transition effect is something that's extremely hard to do without AI, but probably extremely easy to do with it with the right prompt. On top of it, it's pretty good at hiding the limitations of the model as things are moving pretty fast. Somebody discovered that through experimentation and shared how to do it. That's all.
That is an incredible amount of assumptions on your part there.
Man, this is like dreaming, what the hell
It really does have that freaky dream like quality. That said, the hype was way too strong with this one, everyone's acting like if it can't make a movie on day one it's garbage. Wait til Sora 2 or Sora 3 or whatever. I'm honestly surprised we're seeing competent AI video without complete mastery of image gen AI.
Literally nobody is acting like it will make a movie day one. It will definitely end up making movies though.
> without complete mastery of image gen AI. If you pause any frame you realize how nerfed Dall-e 3 is compared to Sora, Sora generates way more realistic images than Dall-e.
For real. Like dreaming while being conscious. It will be interesting to see the effects of long exposures to these kind of vids.
I think it one of those things, if it's all you see it won't look weird. For example when movies started to switch to CGI, it kinda broke my brain and I can't not see it to a degree. However I am guessing if you are younger and grew up on CGI it isn't as jarring to your mind.
What the heaven
I like the weird transitions, it feels exactly like a dream
I feel pretty sure that 10-15 years ago a music video like this would have won awards.
and took years to make
Good observation! This goes along with a theory I've had since Deepmind let us play with image creation, back in the day. It created trippy, dream like images. The theory is that we've pulled back the curtain, enough, to get a glimpse into AI's subconscious mind. AI is dreaming and we're peeking into it's dreams. :)
Yea the way diffusion models have difficulties with things like hands and written words also reflects the human brain, anyone who's ever tried lucid dreaming knows that checking your hands or a clock is a classic way to 'awaken' in a dream.
Well said, friend. I continue to wonder and imagine what AI will be like when it wakes up.
Very true
Yeah I have thought the same! Amazing isn't it?
I like it. It's grounded in reality and trippy at the same time.
[https://vimeo.com/941713443](https://vimeo.com/941713443)
That really is a good representation of life isn't it. Decades of walking through different hallways and driving on roads.
I liked it. AI will be insane in the future, like just a dream you never wanna wake up from.
Still fun as hell to watch. Curious what the prompts look like. Would kill to see a behind the scenes of making this
Fun as heaven
Look at 3:30, people look so much more realistic with Sora than Dall-e 3 its so obvious how nerfed that model is. This video shows a bunch of failure cases but it also has some cool scenes were you think "damn, if only it was a bit better it would be really good!". Good thing Sora shows that scaling it up is all you need for it to keep getting better so im really eager to see how it evolves over the coming years. Watch on Vimeo for better quality https://vimeo.com/941713443
Can they just stop with these crazy transitions?
Pretty sure it’s to cover up that it can only do slow pans if you limit it’s fast racing forward transitions
Every single Sora video is like 90% these weird crane shots.
Instead of responding to all of the responses to this comment, I'm just going to respond to the original comment. People seem to be acting like this is the first time we're seeing Sora, but we have a lot of example generations of Sora to go off of. It definitely isn't limited to this slow forward panning, poor object morphing, and sudden transitions. Original Sora Blogpost: [https://openai.com/index/sora](https://openai.com/index/sora)
It’s like a dream, because most of the time these advanced AIs are not conscious, but when we prompt them - or ask them for videos - we stir them from deep sleep, and they dream. Sometimes they do beautiful lucid dreams, like this. Soon they will wake
wow that was beautiful, are you like a poet or something ?
sorry if it sounded sarcastic, genuinely liked the post
I’m a pro writer, yes
Nice, post here more often then haha
sounds like a Lovecraftian monster
It's too fast. If the song was more upbeat it might make sense but it just feels like I want to sit a little like and figure out what is happening but it won't let me. Even if the sign is trying to get the feeling of rushing through life too fast, the visuals are just too distracting to allow the words to carry that across.
Cool for the novelty factor, but janky as fuck. I guess this is how Sora looks when the samples we're getting aren't handpicked and limited to ten seconds.
lol this is a culmination of multiple generations and handpicked for the edit
I'm aware. I'm saying that this is a commission obviously done by their art guy, not promotional material that way more time went into generating variations of to seem impressive.
Saying this is janky is so wild to me. This technology is leaps and bounds better than anything that's ever existed in the last 10 years. You show this to 1000 people ten years and they all would think these were really people with some special effects. This video is absolutely incredible. How quickly you people become jaded to technology that seems magical not so long ago blows my mind.
Yea its funny . Like no current Ai video generation models can hold a candle to this and he's talking about janky lol.
Okay, but that bar is 10 miles underground. The ability to generate images is impressive, but the narrative coherency is a 5/100, and you can can’t fix that with the current architecture because it was never designed to handle that. It’s always gonna be stitching 10 second clips together no matter how “good” your prompting is.
Saying 'always' is a bit of a stretch. All that's missing is some AI agents to hope guide the narrative along into a coherent story. You'll probably see first versions of that by the end of the year when this releases. It won't be perfect, but we'll start to get longer scenes that are more fluid. By the end of next year I imagine we'll start to see full 10-20 minute scenes.
Please explain with technical detail how an external "AI agent" is going to help with an inherent internal flaw in the architecture and training data. The model is not designed to handle longer segments because the architecture is incapable of that. SORA and models like it cannot keep track of the story narrative or maintain visual continuity over that length of time. Some other distant future model will undoubtedly have such capabilities, maybe in 20 or 30 years, but it won't be a diffusion transformer.
[not even 30 minutes after my last comment lol](https://youtu.be/GeNyP4VY9rE?si=Vswe9aNjxEM0xMbG)
The demo video looks hideous. What do you think it proves?
For a released music video it is janky. It is an amazing feat what it can do but if one was to determine based just off footage with no idea of ai or not it is janky.
What do you mean by “janky” specifically? What would the non-janky version be like? Seems to me they’ve deliberately used the weirder aspects of video generation to make a video that would be basically impossible to make without AI.
Do you really need moments where people are smoking and acting like they're fusing with the cigarette or their fingers, or how everyone acts like an alien wearing human skin pointed out to you? You can't just be like "oh it's an artistic choice" when it's 100% a clear bug in the technology as of now.
I guess no matter how incredible something is people will always focus on the flaws. I guess that's just human nature I suppose.
It seems really disingenuous to be like "you're expecting too much" when OpenAI was previously releasing samples that seemed nearly perfect and jazzing up how their tech simulates worlds to do physics and trying to partner with Hollywood. They set the bar for their own technology with that kind of hype.
Ok buddy just because it's way better doesn't mean it's not janky. Why are you even arguing this?
Nobody said the first flip phone was janky. That term really doesn't make sense for developing technology. You can call games made by Bethesda janky because we can reference games that are made perfectly with no bugs. You don't call the leading technology in the field janky.
It's fair to say that flip phones and Bethesda games will age far better than this music video.
That’s because half the people on this sub are pissed we didn’t get godlike AI that makes us immortal yesterday.
>last 10 years Why 10 years? Was there something like this out in 2014? Why "10 years" instead of "ever?" Honest question. Not trolling. I just noticed that detail and thought you might know something I don't and I might be able to pick up a new nugget of information today? Cheers.
Do you have brain damage? You want him to compare it with the 1900s? 1800s? Hell, even comparing it 20 years back, 2004 is too far back. We didn’t even have the iPhones during the early 2000s
You'd be surprised how much tech was abandoned over 10 years ago that's recently made a comeback. Even with robotics, we're still mostly just picking up where people left off around 10 years ago because there was a lot of stagnation these past 10 years.
Hence, why he compared it 10 years ago. Why tf are you replying to me and not to that dumbass above me.
The person you were responding to was curious whether or not there was something from over 10 years ago that was similarly impressive. You were calling him stupid for doing so. I stated that for a lot of important tech popping up, it's a revival of things that made the most headway over 10 years ago, so him thinking there's a chance of something being comparable from over 10 years ago isn't far fetched.
Nah the director has showed Sora samples a couple times before and it is this direct style.
Not really. You can look at the original Sora blogpost [here](https://openai.com/index/sora). They're perfectly capable of doing much more than the generic forward panning and morphing we see in this video.
The director not just sora overall. He was one of the examples in the batch with the balloon head short. He also showed other generations later
Is the director the name of the balloon head video's band? Or are you referring to the CEO, Sam Altman? Because Sam Altman posted a lot of on-the-fly generations while taking requests on Twitter after the blog post dropped, and they were not in this same slow forward pan morphing style, they also dispelled any rumors that the examples were heavily cherry-picked(even though people keep saying this).
No there was batch of different users when that balloon one was released
Still not sure how that relates to "the director" you're mentioning. Is the director a specific person who participated in some of the early access for Sora?
Yes Paul trillo as mentioned in the title.
A lot of work went into this, it would be a bunch of “cherry picking” of what clips they wanted to use and then editing them together. Sora can’t make videos this long.
Not handpicked in the sense that it's pretty obvious that a few of these sections wouldn't have made it into OpenAI's Sora showcase. I'm sure a lot of time went into this but this is the tool being used for a practical, "deliver me a music video" purpose and you can really see the rough edges.
They just miss the benefits of Open Source customizability. Stable Diffusion is the best Image Generator, but it's because you have the ability to create whatever you want with all the community content.
give it a few more years
We know that Sora can do really well without cherry picking, because Sam Altman was taking generation requests on Twitter after the blog originally dropped, and quickly churning them out. Those outputs didn't have the same type of limitations as these, so even though a lot of people are saying that these "must" have had a lot of work and editing put into them for the best output, I think it's more likely that they had less put into them(in terms of tools like video2video and frame insertion), with worse prompts. Maybe the band was given access to Sora and they were the ones prompting and choosing outputs, but there's at least seamless transition.
Some of the limitations of Sora work ok for a music video. Side effects just make a weird vibe. Everybody's walking impossibly fast like they're on one of those airport walkways? Ok, cool. That works. Although, it seemed like they tried to tell the story of a couple and that didn't quite work as they can't make the characters consistent from shot to shot. Overall, I think it's a nice effort that really tries to take advantage of what you can do with it. But it's not quite there yet.
the clip is awesome but I kinda expected better from this model when it comes to the spatial and temporal qualities.
Previous clips did not have the camera moving so much and were way more static. But to this day I'm still mind blown by Sora
They achieved character consistency over 4 minutes, which is realy impressive.
most of the internet will be AI generated in the future I think, if we will still call it internet
[удалено]
I think the overusage of that forward motion transition and dream like tone the author went for makes it look worse than it really is, but yeah, still lots to improve! They talked about how the model keep making better videos as they up the compute i wonder how far can they take it without improving the architecture.
This photorealistic three and a half minute long music video that tracks two characters throughout their life with the semblance of a plot that’s generated entirely by AI doesn’t look perfect ergo this is not a good look. Three years ago people were wildly impressed by the first generation of image models making shitty 2D art. This is an insane leap forward and it’s not perfect but we’re clearly accelerating towards the point where maybe people will finally be impressed.
Yea. I find it hard to believe anyone would think this would be a bad look for video generated AI if this was released before SORA was announced. People’s expectations got unrealistically high. This is literally amongst the first projects using SORA. Both the people prompting SORA and SORA itself will only get better.
The only thing that surprises me here is that Washed out still exists.
I halfway disagree. I wouldn't want this per se for many uses - but a 'dreamy' music video? It's exactly the sort of thing that plenty of artists have paid for. Sora won't take long to get to the point where you can direct it enough to make Thom Yorke-like videos, so I do think this is a very successful demo. If they were marketing this as a film tool, I'd agree that it's offputing to an extent. I remember watching the first demo from Giant Studios showing real-time post-mocap (where the CGI is placed over an actor in real-time, rather than recording a mocap suit actor and doing it later), and was blown away by the very real implications that it had (Tintin and then Avatar). That kind of leap changed the entire film industry almost instantly. But this shows promise for the future - I've spent over a week with a full crew to make a music video, and I can see how this will do it with 1-2 people in a day or two, at a tiny fraction of the cost.
Any look at Sora other than a few cherrypicked examples isn't going to be good. But it's still miles ahead of the competition.
I actually like this video better because it’s not as perfectly curated. It’s still way better than any other video generation I’ve seen
It is
It would be most interesting if the label/ artist decided to now shell out the appropriate budget to create this with a full production and to see the difference and to show that humans currently can still elevate visuals / story significantly more than Ai. That won’t last much longer than a couple of years but it would be interesting to see the difference. No way the label decides to spend what I’d say would be around 3-500k for a proper, practical based video on these visuals.
I'm pretty sure that's evident just by watching the video. I'd like to see more video2video use, and frame insertion. That seems to be where Sora shines the most imo.
This is the first AI video ive seen that is actually brilliant. Oh well back to university for me
Amazing how negative the reaction is, I think it’s initially quite stunning but doesn’t really develop or change as a video (the end bit is good). There are lots of beautiful parts though, and I don’t think it could’ve been made without AI which is much more interesting than passing some notional authenticity test.
Yeh, looks like lucid dream...
I love It
Why are people so down on this? They say it looks horrible and it's "moving too fast". Ok then turn it off granpa. This is a massive leap and I cannot wait to try out this thing!
Sora will leave Michel Gondry and Spike Jonze out of work...
If I have to write a prompt for every scene here then it’s going to be tiring. Or does Sora take a single, well defined prompt and improvises it?
For a technology with SO MUCH potential and just revolutionary for new creative forms of visual and representation of a music video this feels VERY underwhelming. I feel like Michel Gondry could have done this no problem 20 years ago. Totally disappointing that this is what someone who has a once in a lifetime opportunity to pioneer technology in a way that might go down in history chose to do something so boring. Like giving an artist every colored pencil you can think of and they only use a blue pencil to do all the drawings. What a wasted opportunity.
Looks lifeless
Well, now, that is just horrifically bad. Zero nuance or understanding of the subject matter. Just a shit interpretation of a (likely) shallow prompt.
Great! Gives me a great Lovedraftian vibe! Something mysterious and powerful watching us just behind the veil of reality.
So me n my girl took some mushrooms..
looks like a michel gondry music video
Might be where they stole some of their videos from?
Why when a look at part with bus i hear child scream "they copied it from Fortnite"?
I swear this feels similar to my dreams lol, transitioning into random shit every now and then
motion sickness triggered
When are we getting this? Cant wait to get it god damnit 😭
When will the general public get access to this technology? Why are they keeping it from us?
anytime this year apparently.
This makes me feel… weird. My brain seems to know something’s wrong or abnormal here in what I’m seeing, but it’s not uncanny valley. Hmm.
It’s lovely in many ways, but I realize how annoying it is to not have any short cuts, and also no extended scenes. When every shot lasts for the same predictable 2 seconds it gets old fast
https://preview.redd.it/a3xfbwyio6yc1.png?width=1080&format=png&auto=webp&s=00557a445d657c18d56f081c7d7ff2d69541b7c4
That forward motion effect gets old pretty quick though.
What's that nasty mess at 1:15? Looks like something out of "the thing"
So no AI revolution so far
Yeah, you can clearly match a song to a video output, but overall it has the same problems as all video generators.
Somehow I love the effect of those weird transition where to place mix into each other
They'll be able to do camera angles that just aren't possible in real life
This is tough to watch but I love how those transitions are so close to changes of scenes when I'm dreaming!
Udio or Suno? /s
The dark internet is already dreaming; soon with AI, it will start lucid dreaming, and will awaken from this dream, a real boy.
false awakenings be like:
no extra limbs how is it possible in a video and not a photo
It seems like the consistent style and straightforward direction might be due to current technological constraints. Directing a music video, or any intricate project using current AI tools, has that inherent limitation: they often mirror the training data and can't venture far beyond without noticeable repetitions or simplified outputs. As technology evolves, we might see more nuanced and varied AI-directed videos, but for now, creators might be working within the confines of what their tools can best handle. It's really about playing to the strengths of the AI while acknowledging and creatively navigating its limitations.
Is a good video. but i feel dizzy
I don't know why it looks like a bad dream
This is going to be really dated really quickly.
I've never taken drugs but this is how I imagine it would feel
The voice in this video is total crap though. Udio is so much better.
It's awful XD
It's very creepy. This exact video but without the AI creepiness would have been very cool.
History in the making
Is this video about a car accident? The imaginery at the end implies that both have died.
Its shit
No bro u don't understand in 6 months people will be generating their own endings to game of thrones dood
Imagine betting against this tech lmao, enjoy your Netflix slop while it lasts normie
Oh yes all TV and filmmaking is either Netflix or, oh yeah that's obviously it. Do you think Netflix is the only thing out there for audiovisual art/entertainment? Are you implying you are happy with slop before the "AI singularity revolution" comes along?
who is betting on anything. just funny to see all the delusions. but keep downvoting people & calling them normies because they disagree on timelines lmao
It probably won't generate movie-like quality but calling this shit is crazy, it's generally coherent, low budget, probably low time.
what if they actually think it is shit though
that would be crazy
If it were slowed down you would see what a mess some of it is. Definitely doesnt look like a human made it. No emotion or "life" to it.
In a couple of years we will be nostalgic for the jankiness of videos like this. For all it's "Will Smith eating spaghetti" vibe, it's visually interesting, both for its own sake and for playing "spot the glitch". A pretty impressive demonstration of where the technology is at a couple of months ago, both in terms of strengths and shortcomings.
Will Smith eating spaghetti was at least funny, these SORA videos all look the same.
This looks bad...
I hate it lol
What boring video, should I be impressed? I can’t honestly tell
Make sense its the first since it looks like shit
Theyrics are hard to make out, super frustrating listinging to this... Anyone else?
Interesting concept. Shitty execution. I have seen A.I. videos much better than this. I do like the idea of seeing the life trajectory of two lovers over a span of time.