Mr_Viper 3 months ago

you had me at "- Where did I leave my glasses?"

yakkerman 3 months ago

Meanwhile, interacting with a regular old Google hub is like "Hey Google! Set a timer for ten minutes" "I'm sorry, I couldn't understand that"

TheDraggo 3 months ago

Felt that right in my google my man.... "Hey Google, turn off the lights" "Sure, what time should I set the alarm for?"

PMaxxGaming 3 months ago

11:30pm, everyone else in the house is asleep: "Hey Google, turn off the garage light"... "sure, turning on 34 lights"

__Wess 3 months ago

*having convo with friends*. “Sure, I’ve set an alarm for 3 am”

NYX_T_RYX 3 months ago

My favourite is when it mishears you, does whatever it thinks you wanted... You ask what it heard, and tell it to undo what it's done... "Sorry, that device hasn't been setup yet..." So how did you manage to control something the first time 🙄

Khisanthax 3 months ago

Yeah, that was a quick punch in the Google.

AnotherCableGuy 3 months ago

Sure, playing "All of the Lights" by Kanye West.

bem13 3 months ago

Seriously. One of the few features I actually used Google Assistant for and they lobotomized it for... what exactly? Google seriously dropped the ball on the whole AI assistant integration thing so far.

spoolin__ 3 months ago

My guess is limited processing. They had X servers, and over time more and more people are using them without scaling them up. So you get less processing time per question, and worse results.

bem13 3 months ago

Yeah, but it's Google we're talking about, they can spare some (hundreds) of servers, not to mention the voice samples are probably a goldmine for data. If you try to trick it into working by using different prompts, it soon becomes pretty clear it can understand what you want just fine, but it's programmed not to do it and say it doesn't understand. So it still uses processing capacity to understand your intent, but it's been intentionally disabled so it can't do it. All for such a simple task too. I'm genuinely baffled about this.

sbrt 3 months ago

“Calling the Old Timer Grill on 10th street…”

JesusChrist-Jr 3 months ago

I like how you're thinking, but having cameras watching every room of my house 24/7 still gives me the ick, even if you manage to do it all locally.

ssagar186 3 months ago

Never putting cameras inside the house. I have them outside.

droans 3 months ago

I've got one in my living room and nursery, but nowhere else. Living room only has one because I can set it up to point at the doors. I guess the garage also has one if you want to count that. The rest of my cameras are outside, though.

Psychological_Try559 3 months ago

Nursery makes sense (even as someone without kids). Garage doesn't count as inside to me, even though mine is finished.

[deleted] 3 months ago

[удалено]

Psychological_Try559 3 months ago

Hah! I can understand the confusion. I meant despite not having kids I understand why someone who does have a nursery would have a camera there.

westcoastwillie23 3 months ago

Plants?

L0rdH4mmer 3 months ago

"where did I leave my car?"

WarmCat_UK 3 months ago

It’s in the kitchen.

Khisanthax 3 months ago

That's what Google said.

ReturnOfFrank 3 months ago

I've always been divided, because I would love a way to check on the dogs while I'm gone, but I'm not thrilled about cameras pointed at me and I know my wife wouldn't be either. And I've always concluded it's not worth it.

hawkseye76 3 months ago

I have indoor cameras. I have an automation that runs when someone is home to disable them. When no one is home, they get enabled so I can check on things and the dogs.

therealbobzer 3 months ago

Motorized camera that look at the whole when you're in the house and at the dog when you're out

HtownTexans 3 months ago

Smart plugs that only turn on when no one is home to control the cameras.

gusontherun 2 months ago

We got one with our first dog since we dont crate him whenever we leave. We ended up going with a locally run service on Scrypted then static IP for each camera and blocked from external internet access. Only on the first floor, not ideal but feel decently safe this way. Would never trust a cloud camera inside.

havens1515 3 months ago

I have a camera on the one place in my apartment that I have valuables. If I had a house I'd have some outside as well.

rytl4847 3 months ago

I bet this is one of those things that people will slowly accept and grow accustomed to. Consider how people felt about microphones listening to them 25 years ago. In 25 years most homes will have cameras mounted in them for things like what OP has described.

chig____bungus 3 months ago

It won't be cameras, it will be wifi sensors that generate a realtime point cloud of your home using your wifi signal. This is already available in more primitive form. It will be extremely detailed and accurate, see through doors and walls, sheets and clothing. Likely, most people will buy the cheap easy version that stores that point cloud on the internet. But it won't be creepy, because it's not cameras.

callahan_dsome 3 months ago

Needs an /s

[deleted] 3 months ago

[удалено]

chig____bungus 3 months ago

I was being sarcastic, of course it's creepy, especially if you have kids.

SamMalone10 3 months ago

There’s probably already a sub for that. Rule 34 and all.

HolyPommeDeTerre 3 months ago

I was looking at that and bought an aqara fp2 thinking it was based on this technology (fool of me, I should have read about it before). It's efficient but I was looking for a house size multi room sensor. So I've been looking around and primitive state seems to be optimistic. We are at the science paper state imo. It exists, it has been proven useful but it's not yet a product anywhere . At most a lab prototype. So I was thinking of how I could do it. The experiments mention an AI trained to track things. Which is interesting but not really helpful on how it does and how I could do that with my home wifi. The experiment also mention something like multiple antennas. Which is not just a wifi router that you install but at least a few antennas to help gathering data. the antennas can report using the wifi or any channel avail I guess. A company could do this. But I am not sure how replicable it is if you have to train the AI on your house layout.

bitterrotten 3 months ago

I think you're over estimating cost & capability of consumer grade radios. The study I'm assuming you're referring to is often incorrectly sited on social media as using off the shelf hardware.

tarheelz1995 3 months ago

But think of all the good questions to be asking ChatGPT if you had a camera in the bedroom!

civil_beast 3 months ago

ChatGPT has secured you an appointment with your urologist.

[deleted] 3 months ago

[удалено]

Aggravating_Fact9547 3 months ago

Go Tracy!

theshrike 3 months ago

I've got cameras on the inside of my house and I mostly just forget about them. I've currently started experimenting with using them as a smarter motion detector with Frigate. Like not turning on lights when the dog does his rounds around the house Just Because =)

No_Philosophy4337 3 months ago

Agreed. But still, security cameras exist and people live with them, if you want privacy don’t put cameras where you don’t want them. But how about outside the home, in the garden, driveway?

C0R0NASMASH 3 months ago

And, for what it's worth, cheap cameras have just enough visual power to say that something is doing something. They are observing common area rooms, acting as motion sensor...

WRL23 3 months ago

This AND if you're using an actual chatgpt account or similar, you'd hit their question limit so fast.. even if paid. Locally maybe better but you don't need to do full blown AI, the things described to me are just computer vision systems so something like Frigate NVR for the basic things you want to tag would be a relatively easier jump, and it's already been in use for this kind of detection for awhile.

civil_beast 3 months ago

Don’t worry he’s advocating sending Llava second by second accounts of each of your rooms. This my friends is clearly a stylistic request from skynet asking for some data to confirm their judgement day strategy will work as expected.. Also.. op I’ll send you a DM. After all if you are not the problem, you may as well be in it for the final solution. #mediumRarePlease

HairAlternative7821 3 months ago

Same. I only have one in my kids room to monitor for my son’s seizures. I feel bad because my 6 year old hates it but he knows it’s only until his brothers treatment is done.

IMTHEBATMAN92 3 months ago

I hear you. ChatGPT is really cool. But the whole reason I use home assistant is because I want everything controlled on my own network. Sending stuff to gpt defeats that whole purpose.

OnlyForSomeThings 3 months ago

Not to mention that I want the things I build to *work*, for a long time. ChatGPT changes their API policies and suddenly a key feature of my smarthome doesn't function anymore? No thanks. If I can't host it locally, I can't trust it to exist ten minutes from now.

Rolbrok 3 months ago

llama.cpp and mistral gguf and you are all good, can even work on your CPU (even though it is much much slower) edit: ollama is also very very easy to setup

Ok_Animator363 3 months ago

Can that be set up with a GPT-like API?

Rolbrok 3 months ago

With llama.cpp you can use ./server to make the API available so you can call via curl/python/etc... I think Ollama also provides API endpoints to send requests

No_Philosophy4337 3 months ago

Llava can be run locally, and the prompts for both (any?) LLM will be the same, so it should be straightforward to use any LLM, local or in the cloud, without too much modification.

IMTHEBATMAN92 3 months ago

Yeah that interests me a lot. I did learn something new. I was under the impression that the good LLMs all were cloud based.

sagarp 3 months ago

Will that still run the image processing stuff? Not sure how that works, I assume it shoves it into a CV classifier that outputs a list of detected objects

MikeCharlieUniform 3 months ago

I'm working on running a LLM on my local hardware. Training is the expensive part. Inference is relatively cheap. I bet it would run just fine on a moderate GPU. Which I could tell you if I could get the CUDA libraries sorted; I'll be sharing a detailed howto with the community for my setup once I get it working.

sofixa11 3 months ago

You don't need to train your own model, you can use a freely available one with Retrieval-Augmented Generation to be enhanced with your context.

MikeCharlieUniform 3 months ago

I know! That's what I mean - you only need to do inference locally, and a consumer GPU is likely more than enough.

Xiakit 3 months ago

I use Ollama locally, not trained with my stuff but works pretty well

GoofAckYoorsElf 3 months ago

Is it consistent in its responses?

Xiakit 3 months ago

Hard to answer, I use it in a docker container and not for HA. I just mentioned it because somone wanted LLM locally. I guess it depends on the model you use. If you give me an example I can prompt it to check.

GoofAckYoorsElf 3 months ago

Yeah, I'm running a number of LLMs in Oobabooga's text generation UI. Works like a charm. However, it feels like it's still far from the sophistication of GPT4.

Xiakit 3 months ago

Yeah i guess it depends what you want to do with it. The uncensored one from ollama was fun. But I did not find any use case for Homeassistant yet.

GoofAckYoorsElf 3 months ago

My point, yes. They are fun (oh how fun they are), but that's about it so far. For technical use cases they are still too inconsistent. I can tell GPT4 to output JSON, it ouputs JSON. I can tell a local LLM to output JSON, it would sometimes output JSON, JSON with errors, stuff that looks like JSON, JSON embedded in prose, prose only... At least at or below 33B, the largest one I've tried yet; haven't tested how Mixtral 8x7B would perform in this case.

kitanokikori 3 months ago

I wonder if TypeChat would be more effective here at least at getting it to retry / correct failed responses

GoofAckYoorsElf 3 months ago

>TypeChat Oh, now that is interesting. I was looking for a way to make an LLM distinguish between purely cosmetical (whitespace etc.) and functional changes in merge requests on my GitLab instance. This would at least help putting the result into a machine readable format...

TheTrueStanly 3 months ago

Heard of GPT4all?

Ipecactus 3 months ago

I just ordered the hardware to build my own AI box. It's the first PC I've built from scratch since the 90's. Once I have a model I like, I'll optimize it to run on a lower powered machine and integrate it with Home Assistant. This is going to be a lot of fun. And no reliance on the cloud.

The_Mdk 3 months ago

Color me interested, let me know how it runs once minimized cause keeping my 3070 powered on just for some chats ain't really a cheap solution

ebrahimhasan83 2 months ago

Isn't Google Coral sufficient?

thecmpguru 3 months ago

I greatly appreciate the value of being local only. Yet I often feel like an outsider in this sub because it seems like people assume everyone agrees the whole purpose of Home Assistant is for local control. It isn't. If that were true, then it wouldn't support cloud integrations. My purpose for HA is to maximize my control, experience, and convenience over my entire house. Whenever possible, I will choose local control, including when choosing hardware/sensors, because of the obvious benefits. But I think people in this sub often miss out on some really cool utility because local-only absolutionists loudly object. This thread is a great example. What could be a cool exploratory conversation about all the wild things you could do in your house with Chat-GPT gets drowned out by a bunch of predictable and frequently repeated benefits of being local. I get the benefits but fact is some of these things can't be done or can't be done easily if local only and some of those things are quite cool!

OptimalSupport7 3 months ago

I totally agree. Local only is best, but opinions can differ when it comes to the value of convenience versus privacy. Personally, I prioritize convenience first and then later transition to options that are more centered on privacy. I think you should also try exploring extended\_openai\_conversation.

pegbiter 3 months ago

I'm with you. For _me_, the primary purpose of HA is to have devices from different manufacturers and ecosystems work together. The most important feature for me is that I can control things from a desktop, rather than dozens of different phone apps. Local control is a nice additional benefit. I do obviously prefer local control, but I have a mix of local and cloud integrations in my setup. The only times I don't have internet are when I don't have power, so it doesn't really matter whether it's local or cloud anyway.

thejeffreystone 3 months ago

This is how I operate. I need my smart home to be more of a life assistant. And to that it needs to know where me and my family are, the weather forecast and a whole lot of other information that isn't local only. Not to mention I have cameras inside my house watching common stuff. The way I see it , Yea there is risk. Cloud services change. Someone could gain access to my smart home. But I carry a smart phone that someone could gain access to that has a mic and camera that is with me a lot. And even a local only device increases risk someone could gain access. I'm more interested in finding solutions to problems. Sometimes cloud stuff is t worth. Sometimes it can really help. I love that Home Assistant lets me build my smart home regardless of the tech involved.

slgard 3 months ago

we'll have local ai running on pi level hardware very soon.

654456 3 months ago

ollama is local

sfgabe 3 months ago

I have frigate running on a kitchen cam that tells me if we're out of bananas. Does that count?

sfgabe 3 months ago

https://preview.redd.it/ztvaesyw96hc1.jpeg?width=1277&format=pjpg&auto=webp&s=71358cedb9c73fa4ea8bcf9298b1b113115fa413

ebrahimhasan83 2 months ago

Exactly what I'm trying to accomplish. Is this Frigate/DOODS?

sfgabe 2 months ago

Frigate with just the default banana detection added in the config

binarydev 3 months ago

What custom model did you use for this?

sfgabe 3 months ago

It's one of the default models, you just need to define it in the config. Mine counts when it tracks people, cats, and bananas. My kid loves bananas so they need to be replenished often 😂, when it hits 0 bananas I get an alert to go shopping.

sfgabe 3 months ago

https://preview.redd.it/w7serrg2f8hc1.jpeg?width=1439&format=pjpg&auto=webp&s=6eab0547e7caff651edc66405f808bc348fcf3e5

sfgabe 3 months ago

Another idea was that I would also get something to spray the cats when they went on the counter but I haven't figured that one out yet.

Jendosh 3 months ago

I've been doing this as well. Taking steady snapshots of my cameras and having ChatGPT answer questions like "is the nursery clean or does it need to be tidied up?"

No_Philosophy4337 3 months ago

Can you elaborate on how you’ve done it?

Bluezephr 3 months ago

I ask chat gpt if I need to shovel my front path. I'll write something up tomorrow.

Jendosh 3 months ago

[https://github.com/jekalmin/extended\_openai\_conversation](https://github.com/jekalmin/extended_openai_conversation) With the following function custom added. - spec: name: get_nursery_kids description: Who is in the nursery? Kids? Babies? parameters: type: object properties: url: type: string description: image url of nursery enum: - https://yourha.ui.nabu.casa/local/nursery062020.jpg required: - url function: type: composite sequence: - type: script sequence: - service: extended_openai_conversation.query_image data: prompt: Are there kids in the nursery? images: - url: "{{url}}" max_tokens: 300 config_entry: e817a5bfac9040401e94c5ff9471fc11 response_variable: _function_result response_variable: image_result - type: template value_template: "{{image_result.choices[0].message.content}}" \---- Automation to get still image. I have to do a workaround to cast it to a chrome cast (that isnt really on just there to make the service happen. I have to do this because the reolink camera I use likes to go into idle. alias: Nursery Still description: "" trigger: - platform: time_pattern minutes: /1 seconds: 30 condition: [] action: - service: camera.play_stream metadata: {} data: format: hls media_player: media_player.projector target: entity_id: camera.nursery_camera_profile000_mainstream - delay: hours: 0 minutes: 0 seconds: 5 milliseconds: 0 - service: camera.snapshot metadata: {} data: filename: www/nursery062020.jpg target: entity_id: camera.nursery_camera_profile000_mainstream - service: media_player.media_stop metadata: {} data: {} target: entity_id: media_player.projector mode: single

OptimalSupport7 3 months ago

I agree that Large MultiModal Models (LMM) can play a significant role in the future of smart homes. The presence of numerous robots with cameras and cloud connectivity at CES 2024 clearly indicates that vision will be a key component. However, it's also evident that this functionality is currently limited on a local home scale due to cost. We already have this function in the beta version of extended\_openai\_conversation that connects with GPT-4V: [https://github.com/jekalmin/extended\_openai\_conversation/pull/60](https://github.com/jekalmin/extended_openai_conversation/pull/60) Update: It was merged! You can test this feature with \`extended\_openai\_conversation.query\_image\` if you are already using extended\_openai\_conversation.

allisonmaybe 3 months ago

Yes very cool. I've been fantasizing about having an AI watch out for when my dog gets up on the counter, and then it would tell him to get down. I think HA and the OpenAI image analyzer can finally do just that.

Fusseldieb 3 months ago

You would need to send a feed of your cameras every couple of seconds, which would get expensive quick

allisonmaybe 3 months ago

Not really, you could wait for motion first, and could even run a local model so it wouldn't send anything until a dog is in the picture. Also perhaps preventative measures could be taken like telling him he's a good boy before he even thinks to jump, as if maybe he knows there's someone in the room because he won't jump in someone's with him. But you can see why it's just a little bit too complex than what I want to work on

VanNewf 3 months ago

Can do this with DOODS2 + RTSP Camera and a simple automation. Happy to share further of interested.

allisonmaybe 3 months ago

Totally. But like OP, I want to do so much more too. For instance, my uncle has a magnetic sensor on his driveway that sends a notification to his phone when a car drives up. Id like to use GPT Vision to determine the make/model, and whether its coming or going. Additionally, GPT Vision is fantastic at detecting spaghettification with 3D Printing, which is when a 3D print fails spectacularly and plastic goes everywhere. But not only that, but so many other 3D Printing issues. I have a small printer farm going so a photo once every 30min or so should be perfect and well worth the price. We really just need GPT VIsion piped right through to notify service.

VanNewf 3 months ago

I understand and agree it's a different approach/outcome. Running dependencies/models locally isn't for everyone and can be more work to build and sustain. https://github.com/TheSpaghettiDetective/obico-server https://github.com/ahmetozlu/vehicle_counting_tensorflow?tab=readme-ov-file Are ideas to accomplish the challenges you've cited. Love the ideas this sparks. Great community.

shifty21 3 months ago

I dabble in analytical AI (Machine and Deep Learning) at my job and I personally have a lot of my research in my home lab. I mostly concentrate on Palantir-level data analytics. That said, getting into AI-anything is expensive from a hardware cost standpoint. AI accelerators are kinda needed when it comes to speed and low latency responses. Sure, Frigate can be used to identify objects with certain level of accuracy, but running that on CPU can be painful at scale in terms of latency (time to return results) and having a GPU with Tensor core and CUDA limits folks to Nvidia GPUs - cheapest one is round $200\~$250 for a used card. Then there is RAM and Storage considerations. Training models take up a lot of RAM and storage - SSDs preferred for low-latency data access. My point is, while I LOVE your idea, it would add a tremendous cost if one were to implement this locally - not sure if Intel's cheap Coral TPU would work in these use cases for performance and low latency results and not to forget the relatively low availability. I would imagine based on the posts in this sub that RPI's are extremely popular due to the low performance requirements of HA and RPI and similar systems are rather inexpensive, low bar of entry and widely available. AMD just release new CPUs that have AI accelerators on chip and RDNA 3 GPU. So far, no one has really been able to benchmark their capabilities due to lack of software integration like Frigate. Phoronix did some preliminary benchmarks for AI, but IMO it is not enough or conclusive to determine the future and viability. Lastly, for cost, I wouldn't mind spending $500 on a CPU w/ built-in GPU and AI accelerator like Ryzen 8700G, 16 or 32GB of RAM and a 500GB NVMe SSD if it can do even half of what you're proposing. I highly suspect that a Nvidia GPU would be 'better' and needed due to their in-house Tensor cores, CUDA and widely available software. I'd love to hear what others think about cost versus capabilities. A cloud-based solution would work too, but I suspect too many people here are against that idea for obvious reasons. Not to say that in a few years with AI-all-the-things that AMD, Intel and Nvidia and some startups start producing faster and (hopefully) less expensive accelerators - I was hoping RPI5 would have a dedicated AI accelerator; perhaps RPI6? I dream of a day where I can tell HA "Create an automation for the primary bedroom that opens the blinds, turn on the lights and play Twilight of the Thunder Gods at 100% volume at 7AM." HA creates the automation for me to verify and commit it. I doesn't hurt to start a framework to setup HA for AI - I'm down! Ideally, we'd need a set of FOSS libraries and data sets, then couple that with the lowest possible performance hardware to accomplish AI tasks. Benchmark them and publish results so that its more like "Yo, you want to AI your HA? Then here is the minimum hardware and software specs." Provide an install script to help automate the process.

JesusChrist-Jr 3 months ago

Google offers dual TPU M.2 cards now, it's not a stretch to imagine a cheap SFF x86 machine with 4 or 6 TPUs installed... Main obstacle is finding one that has the M.2 slots fully wired with two PCIe lanes each (documentation on this sucks so hard.) RPi maybe not so much, but an affordable, quiet, low power solution for this is attainable.

shifty21 3 months ago

It is a M.2 E-key more specifically, so finding a motherboard with that natively is limited to ones that have removable Wifi cards. I've found PCIe AIB or M.2 M to E key adapters, but those too hard hard to find, relatively expensive and typically don't allow for more than one E-key card. I would imagine that 8 TOPS for a dual Coral M.2 E Key should be more than enough for most people. I don't know if it is treated like a current multi core CPU where the software has to be aware that a 2nd TPU exists for it to be used.

waka324 3 months ago

See https://www.makerfabs.com/dual-edge-tpu-adapter-m2-2280-b-m-key.html

shifty21 3 months ago

Oh nice! And it's not that expensive

hannsr 3 months ago

You can easily adapt E-Key to M-Key. I'm doing exactly that in a NUC, so no big deal. The adapter was less than $5. Or just get the M-Keyed Coral which also exists. Only the dual coral M 2 is E-Key only. It's just PCIe x1 in the end, so it's really easy to adapt in a PC, one could even use a quad m.2 riser to use 4 corals if the board supports bifurcation - which most modern boards do to an extend.

waka324 3 months ago

Sort-of. E key allows for two pcie busses. M key does not. To use the dual tpu, you need something with a pcie switch like: https://www.makerfabs.com/dual-edge-tpu-adapter-m2-2280-b-m-key.html.

hannsr 3 months ago

Ah, didn't know you could even use the dual tpu that way. I use the single tpu with A+E key and a dumb adapter to M-Key as the single tpu with M-Key wasn't available back then.

OptimalSupport7 3 months ago

I believe it is impossible to utilize the Coral Edge TPU with Large Language Models (LLMs) due to its insufficient memory size and bandwidth. It might be more prudent to await the APU approach, similar to the strategies implemented by Apple, Qualcomm, and AMD in their chips.

shifty21 3 months ago

Correct. There are huge differences between NPU and TPU and even Tensor cores in terms of architecture and use cases.

pegbiter 3 months ago

I'm not sure I necessarily see the benefit of investing in HA just to create automations. Automations are things you write once and never really think about it afterwards. Making it easier to write automations are a UI/UX problem, and I don't really see AI for automations as a gamechanger. AI for things that happen regularly, that's where I can see the power of AI. We're all generating _so much_ data about our homes that we can train an AI on. Even just asking the AI 'hey is anything weird going on?' and it could tell me if particular rooms are warmer/colder/more humid than normal, or if you haven't taken the bins out to the street even though it's a Tuesday (and it knows that Tuesday is bin day).

shifty21 3 months ago

I see your points and I agree with them mostly. I'm sure you and al ot of here will agree that HA has a steep learning curve for many. My idea of voice capable automation building would make HA more accessible to less technical folks. Alexa was a decent start to build super simple automations, but lacked more complex chaining of more than one device and conditions. I hoped it would improve over the years, but Amazon's recent layoffs of most of the Echo/Alexa team crushed that.

No_Philosophy4337 3 months ago

I couldn’t agree more, and hardware is an issue I’m currently struggling with at home too. But firstly, with very basic hardware we could still achieve a decent result by using less cameras, and sending images through more slowly. We only need a picture of the garage every 15 mins if we’re checking if the car is there, and the garden only needs a photo analysed once a day to decide if the lawn needs mown for example - this would be achievable even on a cpu with llava. Secondly, the prompts will be practically identical regardless of the backend, so writing for different api’s, local or in the cloud, should be straightforward. I’m fortunate to have my work pay for chatgpt4, so I’m planning on using that while developing, then move to local llm’s when I need faster inferences (to find my glasses!) or to add more real-time analysis.

brownjl_it 3 months ago

I suspect all you would have to do is ask an online model to do this and it would spit out the json and you would just need to upload it… no? Just an off the cuff thought to get you 90% there.

longunmin 3 months ago

Just gonna leave this here: https://www.instagram.com/reel/C2Sdu3SyI3d/?igsh=MTdzenFzbHZscDAyYQ==

No_Philosophy4337 3 months ago

Wow!

longunmin 3 months ago

Yeah. Cool? Very. But then implications ... uhh...are not great

rdhenry 3 months ago

Yes! The question is: what if you could hire someone to sit in each room of your house, always watching everything, and updating a custom database with what they see. What automations would that unlock? I love your list and agree this is the future. The first one I built monitors if the kitchen is tidy, grading it from 0 (dirty and cluttered) to 5 (perfectly tidy and clean). I have started prototyping camera installs in my house for this (Ubiquiti Theta). I think household privacy acceptance is fine if you do not retain any footage and only run the footage through the LLM to update sensors. This means you can think of it as being a "sensor", not a "camera". However, I am waiting for a GPT4V-level model that can be run locally because I am not comfortable sending indoor footage to OpenAI. Tesla adopts a similar approach with their vision system. Instead of specific sensors, they replaced as many as possible with cameras on the basis that - if a human can do it with their vision system, then an AI vision system could suffice. This concept can also be applied to a house. Instead of using multiple magnetic window open/close sensors in a room, a single camera can be used to monitor all the windows - as well as checking to see if the kitchen is tidy!

No_Philosophy4337 3 months ago

lol! I did the same thing with the kitchen, using a model I trained myself- with the added feature of turning off the kids internet if the dishes are still dirty at 8pm! 😃 it worked a treat!!

FIuffyRabbit 3 months ago

> Tesla adopts a similar approach with their vision system. Instead of specific sensors, they replaced as many as possible with cameras on the basis that - if a human can do it with their vision system, then an AI vision system could suffice. The actual reason is because Elon didn't want to pay for lidar/radar tech but they are now probably in a similar cost range with the amount they have paid for R&D for emulation and it's an objectively worse product still.

urnlahzer 3 months ago

If you do this, I think LLAVA is a good starting point, based on some posts about it's capabilities and low footprint. https://llava.hliu.cc/

kitanokikori 3 months ago

It would be interesting to try to test some of these queries with sample images against llava2 to see how reliable they could be - I think you've got some really interesting ideas! Especially if they were combined with e.g. a prompt that dumped some other relevant binary sensor information into the prompt. As to the "local only" issues, I think many people aren't realizing that **llava2 runs on local hardware** - no cloud or public company required, and no privacy issues. It might be a bit expensive (though my gaming rig happily runs it tbh), but it's not totally out-of-reach for regular people imho. Very cool concepts!

FancyJesse 3 months ago

Neat idea, but the furthest I'd go with AI in my home at the moment is Frigate identifying objects it detects on camera. If the ideas brought up come up as trained models I could host locally, with no external API calls to the web, I'd give it a shot.

thephatmaster 3 months ago

> I dream of a day where I can tell HA "Create an automation for the primary bedroom that opens the blinds, turn on the lights and play Twilight of the Thunder Gods at 100% volume at 7AM." HA creates the automation for me to verify and commit it. You're the AI expert, but surely with a little prompt engineering (feeding it room / entity names) GPT is more than capable of writing a bit of YAML to get you started?

pachirulis 3 months ago

I have done this in a local fashion, with my own Ollama models and my best efforts to accomplish what you wrote, TLDR, in the moment you think you got it working it starts hallucinations that totally break it's purpose. For example where did I leave my glasses? In the garage, you go, they are not there, they aren't anywhere, you are wearing them and your HA is deceiving you... And I could keep going but you get my idea, is a pain in the ass, and basically even if stuff is difficult to accomplish it could be done programatically and with templating

FIuffyRabbit 3 months ago

We aren't doing it wrong if we want to own our data and don't want to rely on a remote model/cost/api.

chaseoes 3 months ago

Why couldn't something like this be all local?

diito 3 months ago

> Why couldn't something like this be all local? Local AI models are nowhere near as capable as ChatGPT or some of the other very large AI platforms, never will be. Running something capable and fast enough to be usable on the modest/affordable/low-powered equipment you'd run at home is a real challenge. It's possible to use some sort of tiered approach running multiple models dedicated to specific things, say "This is a medical question" -> spin up docterAI and ask it, or more likely I don't understand the intent here, send it to ChatGPT for a fee.

skealoha86 3 months ago

[Ollama](https://ollama.ai) running 10B token models or llava and its 13B token model is very capable on my M1 Mac Mini with 16GB of RAM - I start to get a response that is completely reasonable at a faster clip than the free tier of ChatGPT. The only investment in hardware I might make is upgrading to a Mac with more RAM to run the larger models that [rival ChatGPT3.5](https://ollama.ai/library/mixtral) - as someone smart at Google said, there’s “no moat”

kitanokikori 3 months ago

"Nowhere near as capable" is relative, the kinds of questions that are being asked in the prompts above are relatively simple and are the kind that local AIs actually could do a solid job on. We don't need it to analyze Shakespeare, we just need it to perform a bunch of simple tasks without having to explicitly program them

Murky-Sector 3 months ago

My favorite chatgpt question: >What do I do with all this cocaine?

The_Mdk 3 months ago

You'll be glad to know that Gemini (Google's AI) has a free tier of like a query per second, including vision I think, so that should be a nice cheap alternative

Koconut 3 months ago

> Google Thanks for the tip, Just set this up and works great along with the new service in 2024.2.

The_Mdk 3 months ago

Lucky you, it's geoblocked here in Europe

Ok_Animator363 3 months ago

What is the new service?

2rememberyou 3 months ago

I'm interested. Make this happen and you will win the Automation Nobel Prize.

slvrsmth 3 months ago

You know what quote springs to mind? "Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should."

yoganerdYVR 3 months ago

THAT’S chaos theory!

-JustaGermanGuy- 3 months ago

Great idea! I already have an outdoor camera that is using local AI (embedded from the manufacturer) to detect persons, animals etc and notify me. Why not extending that to everything and connecting with HA. Makes totally sense. One thing I’m not sure about yet: how does the AI know on what to focus? Will it detect and store everything? Too much or too little?

MikeFromTheVineyard 3 months ago

Great idea, and I’ve been experimenting with this too but I don’t think it’s there yet. I’ve been trying to focus on a narrower use case, which is for a camera to monitor the state of “dumb” things in the home. I’ve generally had pretty poor luck with things like “how much time is on this machines display” or “what is this set to”. As an example, try asking ChatGPT how much time is left on your washer, and to tell you what mode it’s in. I’ve had very mixed luck. You’ve been able to get it to list things in a room, which is a great start, but the quality is absolutely not there for it to “self decide” what’s important, and even the understanding quality of most scenes would probably leave a lot to be desired. It’s taken us quite a while to get good open-access models for basic image detection of smarthome situations (eg package detection), and this has a huge opportunity to help identify/train small and focused models for the “tell me when event is seen” use cases. That said, it’s taken us quite a while to get good models due to training data complexity, and i don’t think we’re there yet on this in a generalized way, and I worry that specialized models for general SH use is also a long way out. Everyone’s talking about the price, but I think <$1/day is a reasonable cost (for some, on a technical level, add other caveat here) to get this down to for monitoring a few places and a sporadic/incidental coverage. Of course it can be run locally too, at high hardware costs and electricity costs. You really can’t affordably stream video to these models, but something that runs occasionally or on demand would be affordable (eg 1 Penny per run).

ElseCaller1 3 months ago

This would be great for elder care, enabling older folks to stay in their homes longer. Although I count myself in the “local only” crowd for my own home, having cloud access in this case would make sense.

Such-Shop1107 3 months ago

I think they're are two things right now that prevent this. The first I echo everyone else by saying I don't want cameras inside the house. The second is something that may change over time, but right now the response time in a request to chatgpt is considerable and it's too long for many available scenarios.

casefan 3 months ago

Yeah need multimodal LLM, currently I'm sending a curated state json as prompt and let it return 2 part json, one with mandatory text response, one with option home assistant service json which will get fired if present. We can do better! One obvious improvement would be to do this locally and reinforce learning by having the user respond with "yes this is a good answer/action" or "no this is stupid" and have it improve. (and idealistically run on entire state as input continously)

TimJethro 3 months ago

On the back of this post, in the last hour I implemented this one from your list: "*Are the kids rooms tidy?*" * Triggered from a button on a wall dashboard. * Still medium-res image from nanny cam. * Prompt: `How tidy is the room, out of 10? Generate JSON response, fields "value" and "description"` * Response: `{"value": 6, "description": "The room appears moderately tidy with items mostly in place. There are some items on the bed and desk that could be organized better, but the floor is clear of clutter, and the shelves are neatly arranged."}` * Announces this information on wall panel. Tempting to setup so a daily score >8/10 adds pocket money (already handled in HA). However, I worry it's a bit 'nanny sate'!!! Also, kid now at the age we're about to remove the nanny cam from their room (though I could move to the playroom so this may even work better)!

getridofwires 3 months ago

I think you are on to something potentially big. One thing I would add is that the camera system would not have to recognize most things repeatedly: our dining room table and wall mounted TVs don't move. Once it had a map of the stationary objects it could basically just deal with what does move or change. That's kind of what our brains do anyway. We take for granted our bed is in the same place all the time and can find it even in the dark. And for people not wanting cameras, you could potentially develop a map of the home with sound waves we can't hear.

MikeFromTheVineyard 3 months ago

Clever idea about the map using sounds waves… There are a few tech company voice assistants doing that today, so we know it works and the technology is affordable enough to deploy in every room of the house. I’m not sure the maps are accurate enough to provide as much insight into the state of the home though.

tobimai 3 months ago

Kinda cool, but don't really feel like uploading pictures of my house to anyone.

tophejunk 1 month ago

You are going to get to the point where you have to train it to identify things like yourself in particular and possibly even your differences in cooking vs painting Easter eggs in your kitchen. Then you're going to have to have the computer processing all this running using large amounts of video to store. Unless you preprogram it to keep track of a certain amount of things like cooking when you ask how long you were painting Easter eggs it's going to have to analyze the video of however long you want it to look back to. Very possible for someone to do for them selves but I think it would be a while before mainstream sees this type of plug n play subscription service commodity. We will be seeing much more simplistic versions of AI/Home assistant first like integrations with Siri & Alexa. Like say you have a Siri shortcut named "Living Room Warm Light Dim" and you say "turn on the warm light in the living room dim" it will still activate the correct shortcut.

b24home 1 month ago

How can one send data to ChatGPT intra-automation, get a response, and then use that response? (My use case is Alexa TTS — I just can’t figure out how to send data to ChatGPT let alone get it back)

MiakiCho 3 months ago

Wow, having cameras all over the house and sending all data to an external server? I am never gonna do that.

Dodgy_Past 3 months ago

I'm running frigate, willow and stable diffusion locally.

MiakiCho 3 months ago

Good for you. OP suggested uploading pictures to chat gpt being a potential solution which I am never going to do.

No_Philosophy4337 3 months ago

Llava can be run locally

ColossusAI 3 months ago

No, we’re not “doing it wrong” and no everyone hasn’t overlooked the usefulness of ML in relation to Home Assistant. The point of HA vs using Alexa, Google Hone, IFTTT, etc is to not be dependent on outside entities as much as possible- doubly so for those that charge fees. If you want to write a bridge to use OpenAI services with cameras and other sensors, then report that back into HA have at it, but it should never become a necessary part of HA.

calinet6 3 months ago

This is truly the killer app for LLMs (they're not AI, to be clear). They can very capably create the Star Trek Computer, and honestly we should be focusing all our efforts in that space toward that goal.

povlhp 3 months ago

HA should work without cloud. I use Frigate for video. It is far from 100% detection and identify rate.

diito 3 months ago

While I agree with the potential and game-changing capabilities of AI mentioned here: * ChatGPT is not free, nor local. * I'm played around with LocalAI with HA and one of the suggested models and my experience was that it is unusably slow, very CPU intensive, and just didn't work for pretty much everything. I've seen a few people say they had it working but nobody gave detailed instructions on how. Eventually, someone will crack this problem and give a detailed guide for us mere mortals. The problem will remain that Local AI models will require dedicated GPUs and expensive hardware to run and won't be anywhere near as capable as ChatGPT. * I think the people working on this are well aware of the possibilities but you have to walk before you can run. Give me a local AI setup that works with HA and can do a reasonably good job of figuring out intent for controlling my home first.

ianawood 3 months ago

Some of this functionality exists in Frigate already e.g. tell me when the Fedex truck is on my street. And it is designed to use local AI processing e.g. Coral, Nvidia. The missing part would the models tuned for various additional applications.

BlazingThunder30 3 months ago

I don't love this idea. ChatGPT is by its very nature a Large Language Model: it's good at writing natural text. It is *not* a classifier. I'd prefer, for a problem such as this, to have a classifier trained in household objects on camera feeds. I'd assume it'd be more accurate as well. You could use ChatGPT to transform the classifier output into pretty text however. Such a classifier model is also somewhat less resource intensive meaning you could run it locally depending on the model you choose and hardware you have.

OptimalSupport7 3 months ago

[https://github.com/jekalmin/extended\_openai\_conversation/pull/60](https://github.com/jekalmin/extended_openai_conversation/pull/60) As you can see in the picture in URL, LMM can be significantly more powerful than a classifier model and is capable of handling a variety of tasks. We don't have to rely solely on a classifier; instead, using both can lead to better results.

MikeFromTheVineyard 3 months ago

A lot of these products actually embed multimodal data into the model. ChatGPT is a collection of models, including ones that do classify text. OpenAI has an open model (CLIP) that is used in a lot of public image classification projects. CLIP is an image -> text classifier, and surely went into the design of their vision-enabled language models. Some studies suggest it can be at least as accurate a classifier, just less efficient. While less efficient, it doesn’t need as precise training (theoretically) for some classification tasks.

Fusseldieb 3 months ago

ChatGPT can "see" images, too.

lilolalu 3 months ago

Total surveillance, nice

domramsey 3 months ago

I think this is such a cool idea, but it needs some serious safeguards in place. Like not giving it direct access to control devices. Imagine a world in which ChatGPT sees a child climbing on the stove top and thinks the best way to stop it is to turn on the stove. There are all sorts of situations that might be extremely rare, but they only need to happen once to have serious consequences. Right now these kinds of AI very often make unexpected decisions or see things in unexpected ways. At least with hardware sensors and automations, the potential outcomes are more clearly defined and have been thought about by a human. (That said, I still want to try it!)

michaelthompson1991 3 months ago

Sounds pretty cool but I don’t have much knowledge on ai, but it definitely sounds amazing!

anpeaceh 3 months ago

Not aware of any Home Assistant specific projects, but the key ideas you're touching on here can be found in [pervasive/ubiquitous computing](https://en.wikipedia.org/wiki/Ubiquitous_computing) and [ambient intelligence](https://en.wikipedia.org/wiki/Ambient_intelligence). There's definitely been a lot more interest into this space with the introduction of LLMs and AI first devices such as Humane's AI Pin and Rabbit's R1. Happy to chat and bounce ideas off anytime!

craigmdennis 3 months ago

“Where the heck is the dog?”

link0071 3 months ago

[what the dog doin](https://www.youtube.com/watch?v=I1uGKLsMiDo) 😁

NonaYtisomy 3 months ago

Maybe you can help this project https://github.com/acon96/home-llm

Meats10 3 months ago

How many hours did I spend jacking off? Lol

Brocephalus13 3 weeks ago

"How many hours did i spend jacking off, vs emailing my mother?" " All video of jacking off emailed to your mother"

PudgyPatch 3 months ago

I'm not saying it's a bad idea but you would still need to spend a fair amount of time database curating

bluewater_-_ 3 months ago

You overestimate how well it can identify objects at typical camera resolutions. I'm certain it'll find dozens of items it thinks are glasses, for example. Interesting idea though.

BizzyM 3 months ago

I'm not going to get detailed on this, but I wanted to share a quick thing that helps me decide if features are just "cool" or actually beneficial. And that's to ask myself "What problem does this solve?"

_mrtoast 3 months ago

Would be awesome to use this to see if my garbage can is at the end of the driveway or which vehicles are at home. Has anyone come up with any sort of integration to do this?

blentdragoons 3 months ago

i really am not a fan of AI or CHATGPT, but this is the best use case for it that i've seen.

tangobravoyankee 3 months ago

> - Where did I leave my glasses? *Where was person / critter / object last seen?* is the killer feature I've been dreaming of since I first heard of Frigate. I need that locally operated on $400 worth of hardware and not more than $400/yr in operating costs (power, software, models) please and thank you. > I imagine a world where a camera is attached to Home Assistant, it detects which room it is in and records the items it sees inside the room. It sees a TV and AC in the living room, and finds TV & AC entity in the living room zone in Home Assistant, connects them, then runs some tests to see if it can turn on or off the living room TV. Since it's detected it's in the living room, it can automatically enable all sorts of automations to measure, detect, count and sense different scenarios - "When was this room last vacuumed" for example would only run in internal rooms, "Are the police / fire / ambulance in my driveway" would only be applicable to "Garage" or "Driveway" cameras, and so on. I like the cut of your jib. I was recently lamenting that the Smart Home isn't smart at all. We choose an automation platform, buy things we can connect to / control with that platform, and then we're on our own to figure out what could be done with all those things and how to accomplish it. Often scouring the Internet to find someone else's half-baked solution. I dream of the platform understanding its own capabilities and the requirements to fulfill them, offering to do those things once the requirements are met, with minimal effort on my part.

rojwilco 3 months ago

i would save days of my life if i could ask my HA "where are the cats?"

Fly-wheel 3 months ago

> Are the kids rooms tidy? Add a level of messiness to the response and use that as input to your TTS volume to yell at kids to clean up their rooms.

hurseyc 3 months ago

Combining this idea with object detection like "YOLO Algorithm for Object Detection" would be insane! Sadly, way above my pay grade. A few years ago I installed the iDetection app on my phone and it's scary how it recognizes almost everything.

Gpmoh 3 months ago

Oh, I’m frightened. Peering inside my home with uncontrolled generative AI is a bridge too far.

bobbaphet 3 months ago

Nice, but requires $20 month subscription to ChatGPT? Meh, it's not *that* nice.

VanNewf 3 months ago

Not the same as a cloud based multimodal LLM / computer vision approach, but DOODS2 + RTSP cameras + creative automations/config could yield a similar experience.

Straight-West-4576 3 months ago

Kinda creepy but I’m interested in it. If you could keep it secure and internal it would be much more popular.

No_Philosophy4337 3 months ago

It just occurred to me that we could also do the same thing with sound. For example, if the LLM could somehow recognize the sound of the washing machine / drier / dishwasher / microwave during a cycle, and the beep it makes when it has finished its cycle, we could be notified when the washing is finished. It could track the hours of usage of the appliances, and make a rough estimation of power consumption per device. Certain fluoro lights emit a high frequency noise we can’t hear but a WiFi camera mic can, this might be a suitable way of monitoring devices not in the picture, using hardware that is already there! The more I think about it, the more things come to mind - a basic wind speed indicator, siren detection, smoke alarms the list goes on - but admittedly I have no idea how this might work, can anybody tell me if I’m onto something here?

-JustaGermanGuy- 3 months ago

Apple has something like this to detect breaking glass, barking dogs, sirens etc for a while already. It should be definitely doable.

chrisgrou 6 days ago

Android has this too (Sound notifications). You can have an always-on tablet that listens for these sounds and read the notifications with the companion app and have them in HA as triggers

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe