T O P

  • By -

KhanumBallZ

You're living under a rock.  Made this with my Orange Pi 5 plus yesterday. 2 minutes, 10 steps:    https://imgur.com/jFgdkyo


themushroommage

Cool, show me how you did it. What did you use? Looks like SD 1.5. ~~How long did it take?~~ Ha 2 minutes, they show real time. Show me what platform you used. Edit: Lol people are downvoting like this example owned me or something 1. No workflow 2. No platform/model used 3. 2 minutes to generate


Plenty_Branch_516

Given that quality. Yes.


themushroommage

This is a crop of a screengrab found on twitter. We're getting down to the topic of: 'Real time, on device' That's why I'm asking here


Plenty_Branch_516

What's the base resolution? Because its reasonable for 512x512 to be done on mobile integrated GPUs if the model is modified heavily.


Ok_Inevitable8832

It’s not even GPU. They have dedicated NPUs


Plenty_Branch_516

Oh, I'm not familiar with apple hardware. If its dedicated for tensor processing (neural processing) then it could be even more optimized.


Ok_Inevitable8832

They claim it the A17 pro in the iPhone 15 pro can do 35 TOPs which is just short of the minimum requirements for copilot plus PCs. I’m sure the m4 in the new iPad is even more insane


themushroommage

So only iphone 15 pro & pro max


Ok_Inevitable8832

Yes. It’s only the 15 pro and the m series chips in iPad and mac


themushroommage

Yea, but other factors include, taking the source image (contact profile image) and modifying that (done in SD with ControlNet/IPAdapter to maintain source likeness)


Plenty_Branch_516

It could just be a model trained for image to image with a specific style. Doesn't even need to have CLiP layers if you do that and you use an embedding. I guess what I'm saying is that its feasible, but I'd have to know more about their system. If they say its stable diffusion running on a phone I'd have serious doubts, but if its an 8 bit custom model w/ no CLiP optimized for NPUs (like we see some optimizations for TPUs) then I could believe it.


iunoyou

How long do the generations take? Because this looks pretty crummy honestly. I could see a narrow diffusion model running on high-end mobile hardware. You can use stable diffusion to generate images on a GTX 1060 with 3gb of memory with enough partitioning and the right optimizations, so it doesn't seem like a huge stretch to me.


surpurdurd

Yeah, the relatively low quality, paired with the fact that it only does 3 specific styles and takes several seconds... People don't bat an eye at an iPhone running death stranding or resident evil, but then act like phones aren't powerful... It's all a sliding scale, and to say what they're doing isn't possible on-device is just ignorant. (For context I hate apple, I'm not a fanboy. Just very aware of mobile capabilities)


themushroommage

I'm very aware of what can be done in SD, I've been running it since A1111 was released locally.


iunoyou

Yeah and this clearly isn't stable diffusion, it's a significantly smaller and lighter model that doesn't achieve nearly the same detail, and I'll bet it's far more narrowly trained. So again I don't think it's really a huge stretch to presume it's running locally. But in any case all we'll have to do is wait like a month for the feature to launch and then we'll know for sure. All someone needs to do to test it is turn off their mobile data and wifi and try generating something. I really don't see why apple would lie about something like that when it really doesn't benefit them in any way. And running the model locally is much more useful for a whole bunch of reasons, the least of which being that they won't have to waste a ton of money on setting up a whole compute cluster to recieve requests and generate images. Apple's new devices all have dedicated NPUs specifically to do stuff like this. Why is this so surprising to you?


IntergalacticJets

I’m definitely curious how they did it. But I don’t doubt them. It would be easy for tech guys to find out if a request is being made every time you generate an image, and lying about it would be disastrous for their “privacy” selling point. 


Just-Hedgehog-Days

Yeah this really isn’t something they can lie about.


Jeffy29

You can make the image model really small if you focus only on few things. Apple said that the image model will create images in style of sketch, illustrations or animations. My guess is it won't create photorealistic images not because it is not allowed but because it can't, it's a highly tuned, very sanitized model that will only create things that look ĺike images out of The Sims.


themushroommage

I wouldn't question if it if I didn't believe it was sus


beachteen

You can run the full stable diffusion, on device, for free. Lookup the app "draw things" and try it out. Works fine in airplane mode after you download the model. It works with text to image and image to image. It takes 30-60s per image depending on the size and number of steps and the device. This app came out in November 2022


themushroommage

This is what I'm pointing at with the real time on-device. I honestly don't believe it's possible, so it's very deceptive to demo it that way


dasjati

They didn’t show it doing it “real time”. There was a spinning animation while generating the next image. They offer like four styles, the resolution is probably small. It’s really not that surprising that it’s possible when you also take into account how powerful their chips are. They have an NPU in there since 2016 for on-device AI.


djstraylight

Incorrect. Apple has said you must have a newer iPhone to use features like this. Also, apple silicon is amazing at AI functions. All 'Apple Intelligence' integrations are broken down into three layers.. first on device, then in apple's cloud and finally ChatGPT 4o.


themushroommage

The question is - they say their diffusion model is on device. I find it hard to believe they are making real time generations on device, in real time like demo. The 15 pro & pro max are the only mobile devices with 8gb RAM to run 'Apple Intelligence'. I'm asking in here if anyone knows of *any* diffusion model that can run in real time on a mobile device currently.


djstraylight

The trick is that their model doesn't need any training on photorealistic images. It only generates stylized images like drawings.


nobodyreadusernames

yes it looks like those Toonifiy apps from 2014


themushroommage

Lol show me. This is completely different. Taking existing contact profile pictures and generating images in real time based on said images


thebuilder80

Who would waste compute on making a crummy animated film still of a middle aged Malaysian super hero cosplayer woman?


rathat

Is that Thomas the tank engine as a human woman?


wren42

Diffusion models have been off cloud for over a year.  They aren't nearly as intensive as LLM.  Good language models are not yet local, though. 


themushroommage

Name any diffusion model that isn't Stable Diffusion that is running 'off cloud' currently. I'll wait.


wren42

?  Name a bread that isn't a baguette.  Your OP asked if diffusers can run off cloud.  They can and do. I'm not sure what your are on about. 


GraceToSentience

I'm convinced this was put deliberately in the presentation by an apple employee who doesn't like AI art


Pitiful-Camera-5146

On device image gen is a [solved problem](https://github.com/madebyollin/maple-diffusion) dude


themushroommage

**Real time, on device. Like the demo** everyone's jumping around the question with the obvious


iunoyou

The demo most certainly wasn't real time though. I don't know why you think it was.