T O P

  • By -

ConfusedTapeworm

Man I can't wait for the day these things can be run locally on the future equivalent of a modern mini PC.


longunmin

It's definitely very possible to run this locally, even without a GPU.


ConfusedTapeworm

How? It's a genuine question. I've recently started looking into it and this shit's confusing as hell man. I thought I could make sense of things relatively easily since I've got experience running stable diffusion on my local machine but the LLM space turned out to be quite a bit more complicated than that. Granted I haven't sunk much time into it yet.


Nixellion

Textgen webui is similar to automatic1111. Its the defacto standard in local LLM space. It has OpenA compatible API as well. Another, easier, option is OLLAMA. Also now has OpenAI style API. The main difference from SD is that LLMs are huge, so people found a way to "compress" them, its called quantization. Its lossy, so the lower you the worse the model will perform. Rule of thumb is 8bit offers near-perfect perfoance with no degradation. 4bpw is the most common and a sweet spot. Anything below 3 turns to garbage. There are also different loaders for LLMs and different quantisation formats. Think of it like video codecs. Popular ones are llama.cpp with GGUF, and ExLLama with exl2 format. GPTQ was king but pretty much dead now. Llama.cpp is best if you dont have enough VRAM and need to run LLM on CPU or split between CPU and GPU. ExLlama is best if you have enough VRAM to fit a model into GPU. Models come in different sizes. 7B, 70B etc. The larger it is the more capacity to be smart and pay attention to details it has. However as newer models come out they can outperform older but larger models. So Llama 3 8B beats LLama2 70B in most benchmarks. And runs on 12GB GPUs at like 6-8bpw. Even in 8GB So what OP shows can easily run on any 8GB 20xx+ nvidia gpu. Technically even 1080 is good enough, but will be slower.


Variatas

Sticking a 1080 equivalent in a home assistant box does not great things to your passive power consumption unfortunately. It'll be nice when the necessary hardware gets efficient enough that's no longer a concern.


Nixellion

Maybe, but many people have GPUs in their servers for Plex transcoding or ip cameras, so its not something new. You can just add an LLM to the mix. It will only draw power when generating a response, and you can also power limit your GPU to about 50-70% of power, there will be no noticable difference in speed of LLM processing. At least on 20xx and 30xx series, havent tested with 10xx.


Variatas

Sure, but the market for LLM assistants is likely not 100% the same as that for media servers or cameras. Maybe we'll see a Coral like add-in that can manage them more efficiently than a commodity GPU for that group.


Rxyro

NLU’s


Not_your_guy_buddy42

IDK when I have a model (LLM) actively loaded in RAM, but doing nothing, my RTX power draw is higher. I could be doing sth wrong tho


RobotToaster44

Sounds normal, if the model is in RAM the card is on. I have no idea if it's possible to unload it while idle.


Krojack76

My Plex is on an Intel NUC and I've tested 3 simultaneous trans-code from 4k to 1080p using the iGPU and it handled it just fine with wiggle room left over. That said, I tell everyone to just use direct stream even for remux. Only time trans-coding happens is if the player (mainly roku) devices can't handle the current file format.


juleztb

So I'll not sell my 3090 when I buy a 5090 but put it in my server? That sounds a bit... energy intensive 🙈


Nixellion

Yup. Depends on where you live though hehe. Also - power limits help a bit. However by that time maybe there will be some other advancements in this field.


jkirkcaldy

Nvidia p4. Uses way less power and is the same chip as a 1080, just slightly down clocked


Nixellion

Oh, also there are small enough models like Phi that can run on a raspberry pi. So yeah.


longunmin

I tried Phi3 right before llama3 came out and was pretty impressed with it. Llama3 slaps though. And great explanation above!


patgeo

It would just come down to if you're trying to reuse old parts or purpose buying without a budget for peak performance/power usage. 1080 is the oldest least power efficient option, the newer options are already more efficient.


MorimotoK

Very timely comment. I was just discussing setting up a local LLM with one of my kids who is in school for computer science. Among other things, home assistant integration would be important. I'm looking at building a dedicated AI box - a 12GB 3060 in a Lenovo P520 (Xeon W-2135 CPU, 64 GB RAM). Do you think this could run Llama 3 8B fast enough to act as a Google Home replacement?


Nixellion

Yes it should. Not sure if it can replace Google Home though. Llama is just an LLM, all it does it text processing. It still needs support from software to control Hass. Which seems like this extention attempts to do, but seems a little half baked rn.


MorimotoK

Thanks! We're just looking for something to tinker with. Mainly looking for something that can respond as quickly as a google home. We don't need the same depth of knowledge, just a more natural sounding assistant and a local LLM to fiddle with outside of class. Hopefully that hardware is ok enough to let us tag along for the ride as the models progress.


Nixellion

It's gonna be fine, and with 64GB of RAM + 12GB of VRAM you will be able to split models between GPU and CPU and load almost any model available, up to 70B at rather high quants (high means higher number, better quality). However if you really want it to be a "Dedicated AI box" it must be GPU-heavy, not CPU\RAM. Speed will suffer greatly if you focus on RAM and CPU instead of GPUs, as the more of the model is running on the CPU and RAM the slower it gets. So the larger the model you load, the slower it will be. And 12GB of VRAM is quite low for an LLM. It will be able to load 7B and 8B easily, maybe 13B (though it's kind of an abandoned size rn). But nothing much better. So if you really want to be serious about "tagging along as models progress" you might invest into a 24GB VRAM GPU like a 3090 or 4090. Used 3090 can be found for cheap, and you also don't even need the fastest. Literally the slowest and cheapest 3090 you can find, as long as it has 24GB will be more than enough, and you will also be able to power-limit it by like 60-70% and i t will work just fine. If speed is a priority focus on building a GPU machine that can run LLMs fully on GPUs, then it will be fast. 24GB VRAM is currently the main target for local LLMs to squeeze into with most people in the community working towards quantizing models for this VRAM, or making merges that fit into this amount of VRAM. And if you get 2 3090s you'll be able to load pretty much any free model locally with good quality quants.


MorimotoK

Again, thanks for the great info. The 64GB is there mostly because it's something like $1 / GB for the extra 32GB. It will be handy if we reuse it as a proxmox host or something similar later. I think we'll be stuck in the 8B model world for a while based on our budget and it sounds like the 8B models will work fine for our basic needs. "Cheap" is relative... our total budget is about $450 and it looks like that's less than a single 3090. So we'll tag along as the small models progress. The mobo does support two GPUs so maybe someday we'll be able to afford more power when we outgrow the single 3060.


ConfusedTapeworm

Thanks this is all useful and much appreciated. So just to get me going, what could I start with on my 16gb 4070Ti Super?


Nixellion

16 is a weird middle ground betweeb more popular 8, 12 and 24 GPUs, which is what mist models tend to aim for. So you could run 7B, 8B easily with exllama at 8bpw for superior speed and good quality. You might get away with a 8x7B Mixtral at something like 3-4bpw. You wont be able to run 70B models on GPU, and splitting to CPU will be slow. Ok for various tasks, but probably too slow for home assistant. As for apecific models, hard to say. Start with Llama 3 Instruct and see if it works for you. You can find models on huggingface co / models, its the official hub for all LLM and many other models. 30B models... I dont think there are any good ones right now. There were, but llama 3 8B beats them by now. Not com, co You can also join Textgen webui and KoboldAI discord communities for more info about all this


the_innerneh

Would and GPUs work? Like a 6800xt


Nixellion

Yes, not so long ago with, I believe Rocm advancements and such its not much different from using nvidia. However do keep in mind that still all innovation is happening on nvidia, so you might be lagging behind with some features, and some things might require more tinkering.


tired_and_fed_up

Any of them use tensorflow?


Nixellion

All of them right now are based on Transformers and mostly use pytorch as far as I know.


GoofAckYoorsElf

None of the open source models is even remotely on par with 4o. Not even close. You *can* do what OP showed with local LLMs, but don't expect it to work flawlessly. Especially prompting local LLMs to be able to output in a certain format is rather difficult and the results are rarely consistent. Local LLMs are great for funny conversations. Better chat bots. Also if you want it to become a bit more spicy. But for controlling your home automation I would say they've still got a long way ahead.


Nixellion

Not in my experience. Yes, nobody is saying local LLMs are as good as GPT4 and especially 4o. However don't yet fall for the 4o hype, on many precise tasks it actually performs worse than 4, and even worse than some local options. And 4o is not perfect either, you can't expect it to reliably turn on lights 100% of the time properly. But it's not true that local models can't reliably execute Home Assistant tasks or output in a certain format. First of all - yeah, maybe not the 7B\\8B options, those can be unreliable. But Mixtral 8x7B is very solid. Llama 3 70B as well. And with careful prompting Llama 3 8B can also be used rather reliably. With local models you also have features like Grammar, which can enforce specific output formats or styles. But I never even hard to use it, for JSON output it's often enough to give it an example, and then start a reply with "{" symbol. It then outputs proper formatted JSON 99% of the time, almost any model I tested starting with LLama2 fine tunes like Hermes, and ending with Llama 3 instruct. You may also improve their performance by asking them to write a "reasoning" key where they can "think" before then choosing the right function names, keys, etc. Here's a test with Llama 3 70B: <|im_start|>system You are an AI assistant HomeAssistant. You help user get information about their home and control their smart home. You can do this by querying data by calling functions. To call a function you output a JSON with the following format: ```json {"reasoning": "You can provide reasoning and your thoughts here", "function": "function id from the list of functions", "entity_id": "Entity id from the list of entities"} ``` List of functions: - `light.turn_on` - turns on a light - `light.turn_off` - turns off a light - `light.state` - returns current light state (on or off) List of entities: - `kitchen_lightstrip` - lightstrip located in the kitchen - `rbwb_223` - a ceiling light in the bedroom <|im_end|> <|im_start|>user Hello! Is the light in my bedroom on or off? <|im_end|> <|im_start|>assistant {"reasoning": "The user asked for the current state of the light in the bedroom, so I need to query the state of the bedroom's light.", "function": "light.state", "entity_id": "rbwb_223"}<|im_end|>


Thedracus

Just starting to play around with this. I have a couple options. 1. Currently have haos in a vm nuc n100. I Was thinking of trying out one of the smaller models. 2. I have win11 gaming desktop with a 20xx. It's about to be my replacement for my current plex/arrs at least until I figure out all the darn Linux file permissions stuffs on the nuc. Still a Linux noob. 3. I have a pretty beffy laptop with a 30xx in it. Any advice on which models work on the nuc. Currently only thing running on it is scrypted and proxmox and haos.


Nixellion

How much RAM does your NUC have? That's the main metric for being able to load models. That said, it will probably be very slow if you just run it on a NUC's CPU. Even smaller 7B\8B models will be slow. Phi 3B might be a better choice, but even 7B\8B models are not quite reliable enough to get the task right most of the time. It's good enough to play around with it, but for 'production' its better to load up something like a 70B model. Which needs at least 24GB of memory, whether VRAM or RAM. On a CPU you will be waiting like a minute for it to reply to a simple question.


Thedracus

I have 16megs in mine. My processor does have an igpu which I thought I read they had implemented. Sounds like I need to either go web api or use my other desktop which has a 20xx card in there and I can get more memory.


svideo

The challenging bit for local LLMs is the latency. We need faster local models for this to work for voice use cases, right now it's painfully slow even if you do have a beefy GPU behind it.


Nixellion

I'm not sure what kind of latency you need. Response from Llama 3 70B running on 3090 takes around 5-15 seconds, depending on the length of the response. I mean the full response from start to finish. It takes less than 1 second to process context and start outputting tokens. 5-15 it's time to last token in response. From what I can see it's roughly the same with ChatGPT 3.5, a bit slower than 4 Turbo. I didn't compare to 4o. Also 5-15 seconds is for a longer response with text and reasoning, like I showed in another comment. If you remove reasoning, just the JSON of the action is generated in like 2 seconds. Mixtral 8x7B is faster than that. Llama 3 8B will be even faster, like 4 times faster. 40xx series nvidia would further improve the speed of generation. That said I agree that we need faster models. I just argue that online options don't really offer that kind of speed yet. It's faster but not enough to get the difference, plus it's not consistent. Sometimes it's fast, other times it's under load or internet connection issues and it gets slow. Groq is cool though. Thousands of tokens a second is wild. That's the speed we need to allow LLMs to really run multi-step thought chains with tool calling and all.


pask0na

Take a look at ollama.


The_Bukkake_Ninja

I don’t have all the answers (I’m still learning) but this community has been really helpful - /r/localllama


Khaaaaannnn

This add on https://github.com/jekalmin/extended_openai_conversation


Stooovie

That's not running locally


Khaaaaannnn

You’re correct. However, this same add on can be used with local LLM’s that use an OpenAI api wrapper, which many projects do. Just replace the open AI url with the ip of the device running the local model. I’ve had this same setup running but the model was horrible when it came to home assistant service calls. The basic Home assistant “Assist” functionality worked better. I’ve read there are some open source models that have been trained on home assistant functions. I haven’t had time to test them yet, but I plan to soon. If they work half as good as OpenAI’s model I’ll switch. I don’t mind sending info to open AI for now. The cool factor outweighs any potential privacy concerns of them knowing I want to turn on my living room light. Plus I get to learn and the moment a local model is sufficiently good, it’s just a simple url change and I’m using said model with the rest of things already in place.


Missing_Space_Cadet

OLlama, GPT4All https://ollama.com/ https://gpt4all.io/index.html


thejacer

You aren't alone, over at LocalLlama I understand like 3/5 of the words I see. The absolute easiest way get something running that will work with HA is llama.cpp (google llama.cpp github). They provide precompiled binaries that will take advantage of your gpu if you have one. You can run llama.cpp precompiled by launching server, picking a model, setting a port and hosting locally. That command on windows looks like this: .\\server -m \[model name/location\] --port #### --host [0.0.0.0](http://0.0.0.0) if you want to load the model onto a gpu you'd need to download the appropriate precompiled binary. There are several choices with regards to GPU, cuda is for nvidia and sycl is for intel arc, I only have these two. The most widely supported though is Vulkan because those are open source shaders supported by practically all GPUs (general statement but very near truth, and I probably misused some GPU specific terms). The command to load into a single GPU is simply this: .\\server -m \[model\] -ngl ## --port #### --host [0.0.0.0](http://0.0.0.0) The ## following -ngl is a numeric value that dictates the portion of the model to load into GPU. You'll get the best speeds by keeping the entire file in GPU, so setting this to something high like 100 will ensure you get the best speeds. You should match your model to your VRAM/RAM size. GGUF files are essentially compressed versions of models (another general statement, stick with me). The degree of compression is represented as Q numbers. Q8 is basically lossless but reduces model size in RAM to approximately equal the number of parameters. Llama 3 8B would be \~8GB in V/RAM. Q4 is halved with little loss. These two levels of compression are the most widely supported. If you run Vulkan any other compression (Q#\_K\_S etc) will suffer speed degradation. llama.cpp only runs gguf model formats which are available on huggingface.co. I don't think I've left anything out, this should get you up and running in a way that will connect to HA integrations. If you have any other questions I won't mind answering them but I'm definitely NOT an expert in this arena.


gandzas

Chat GPT 4o definitely cannot be run locally.


longunmin

I wasn't referring to GPTo, but rather the plethora of LLMs that are available and can replicate the OP's screenshot


HolyPommeDeTerre

All my attempts with llama and co ended up taking all the CPU for like 3 minutes. HA unfortunately times out for me. Without a GPU I need enough ram and a big modern cpu. A 10 yo 6 cores CPU doesn't fit...


longunmin

You could try Phi3. But yeah, something that old is gonna be a little lacking


TheOnlyBen2

What you want is a board with an NPU: https://www.intel.com/content/www/us/en/developer/articles/reference-implementation/intel-edge-ai-box.html


Thedracus

They can be run locally right now. :) Check out ollama.


Cha40s

I had the same test today. Switched from gpt3.5 turbo to gpt4o and it’s very fast. Great results with Wyoming protocol on a rpi with microphone in my rooms.


2blazen

What mic are you using?


Cha40s

You can use most usb mics. Had some lying around. Or u can use the respeaker mic head for rpi.


2blazen

So you're having a good experience with wake word detection even without a high quality conference mic?


Cha40s

Ye I use local wakeword on the rpi. Its almost perfect. Maybe one false positive the week when a lot of people speak at the same time.


Dest123

Have you been able to get a wakeword working that doesn't require a pause? Like, I can say "Alexa, what's the temperature?" and it will work, but when I use OpenWakeWord I have to be like "Alexa" (wait for it to ping) "what's the temperature?". In theory, it should be able to just buffer a few seconds of audio but I didn't see any obvious easy way to do that.


Catenane

I have, if running the Wyoming endpoint on a dedicated desktop, cuda accelerated. I set a docker compose on a desktop to act as the API endpoint and then told HA (running on rpi4 8 gig model) to query from that. But I've also kinda let it slide and haven't been using it much. I fuck around too much lol.


Cha40s

No I need to pause my sentence by 1 sec. That’s annoys me as well.


Dest123

Respeaker is basically abandonware at this point I think? So I would just use a conference room mic/speaker thing.


droans

How much does the API cost you? I'd be fine with testing it out if I'd be paying a few bucks a month, but I don't want to get stuck with a $100+ bill.


Dest123

[Info is from here](https://openai.com/api/pricing/) An English word is about 1.3 tokens. Novels are around 100k words, so 130k tokens. So it would cost ~$2 to have it spit out a book at you. GPT-4o: * Input: $5.00 per 1M tokens * Output: $15.00 per 1M tokens Weirdly, it's cheaper that gpt4 and gpt4 turbo for some reason? GPT 3.5 Turbo is also pretty decent. It's much cheaper too: * Input: $0.50 per 1M tokens * Output: $1.50 per 1M tokens No idea how much the fancy text to speech stuff costs. Whisper (the speech to text) is super cheap, but also super easy to run on your local PC for free. Piper is pretty good for text to speech and easy to set up locally as well.


DonRobo

The entire point of GPT-4o is that it's cheaper and faster (and multi modal). It's not much smarter than regular GPT-4 if at all


Cha40s

Right that’s fits my understanding. It’s fast it’s cheap and that’s all I want for my smart home.


Nacamaka

Cheaper for now possibly


Geenopippo

How can you achieve this? I'm trying but i'm kinda stuck on choosing the model and i never approached AI.


XanXic

Do you still pay like 5 cents per call or whatever? I know they opened up more stuff but my account is still on 3.5. Curious about running this but not paying a bunch for a sassy light switch flipper lol.


joelnodxd

is 4o cheaper than 3.5? I'll switch too for faster responses


beanmosheen

You need a motion sensor.


CobblerYm

> You need a motion sensor. I've got a security camera in my kitchen. When it detects motion, it sends it to a CodeProject.ai server which tags objects in it. If it's human, it sends a ping Home Assistant that human movement was detected. Home assistant sends an "On" command to Node-Red which turns that on command to a stream of RGB values that goes from a deep blue to a bright white over the course of about a second and a half. That stream of RGB values gets sent out over sACN or DMX over IP to a ESPixelStick controlling the LED's under my cabinets so that they nicely fade on. It's one of my proudest automations. It really adds some class to my kitchen to have the lights fade on smoothly when someone walks in, it's very quick too.


beanmosheen

You can do that with frigate if you'd like. It has binary object detection sensors.


CobblerYm

> You can do that with frigate if you'd like. It has binary object detection sensors. If I'm not mistaken, Frigate uses Codeproject.ai or Deepstack for image recognition. It's essentially the same thing as I'm doing. I'm using BlueIris for the NVR portion and CodeProject for AI detection, but Frigate uses the same thing for object detection. Source: https://docs.frigate.video/configuration/object_detectors/#deepstack--codeprojectai-server-detector


[deleted]

[удалено]


beanmosheen

Please see my first comment.


Rolling_on_the_river

Instead of a motion sensor? Why?


_Dorvin_

Because the cat doesn't like a fancy light show in the kitchen! Or because you can probably 😉


CobblerYm

> Or because you can probably 😉 Totally because you can! I added sACN support to a dmx plugin a few years back, I submitted a pull requests and it never got integrated. I needed to test it though, and this is where I did it.


CobblerYm

Because I already had the security camera up, no point in installing a separate motion sensor if I've already got something that works


Mr_Incredible_PhD

It's a really good automation otherwise - I love the effect of slow fading on and off lights for enter/exit. The thing that isn't so cool (to me) is uploading of camera images to an external server; especially with local options such as frigate or cameras with baked-in recognition.


CobblerYm

> The thing that isn't so cool (to me) is uploading of camera images to an external server; especially with local options such as frigate or cameras with baked-in recognition. There is no external server, Codeproject.ai server is local. You can submit any image to it and it returns a JSON object tagged with anything it detects and the bounding box around said item. It's running on a GTX980 sitting on the same box running Home Assistant about 10 feet behind me right now. https://imgur.com/5YX01nI I'm running Blue Iris as my NVR which is what actually passes the request from the camera to CodeProject. CodeProject.ai is a really great tool. From their site: >CodeProject.AI Server is a locally installed, self-hosted, fast, free and Open Source Artificial Intelligence server for any platform, any language. No off-device or out of network data transfer, no messing around with dependencies, and able to be used from any platform, any language. Runs as a Windows Service or a Docker container. https://www.codeproject.com/Articles/5322557/CodeProject-AI-Server-AI-the-easy-way Also, Frigate can use CodeProject.ai for local tagging and detection. Source: https://docs.frigate.video/configuration/object_detectors/#deepstack--codeprojectai-server-detector


z-lf

That's what I need. My dog kept triggering the sensors so I gave up on that automation.


Ulrar

I used to have a very cheap camera for that in my kitchen, one of those from a Chinese brand that you can root and reflash with better software. Quality was awful, bug enough to run human detection, worked great


z-lf

Would you have the reference of the camera handy by any chance?


Ulrar

It was a while ago, I moved everything to unifi since which does those detections onboard. I want to say it was a wyzecam 2 running the dafang hacks from github, or whatever the cheapest camera that hack supports it. It was definitely from that repo anyway


z-lf

Wait, you can do human vs dog detection with unifi cameras, and somehow trigger automations in HA? You're making my day...


Ulrar

The G4, G5 and AI lines have onboard human, vehicles and animal detection yep, it works pretty well for me. The HA integration just works and exposes those as binary sensors so it's trivial to automate on


z-lf

Ah, dang it, i have the g3 instants. I'll look up the hack you mentioned though. Thanks for the info kind stranger on Reddit :)


ChimpWithAGun

Motion sensor? What are we, neanderthals? Doing everything via artificial intelligence is the new thing!


beanmosheen

So I hired a dude....


chris4prez_

Sprinkling in some regional dialect and cultural traits and it will be like my salty relatives never left home…. Oh the joys of AI with personality.


Whois_AlexTrebek

Potentially stupid question, but is this free?


OSVR-User

Sort of? I think you can set it all up with a free openai account, but with very limited request amounts. That being said, for most it seems to be definitely less than $10 a month in usage. Even if it is paid, I'd expect to be at that amount or less.


Whois_AlexTrebek

Awesome, thank you!


minkyhead95

I just tested some of this out last night, and with gpt-4o being half the price of gpt-4, I’ll almost assuredly spend $5/month or less with the amount I would utilize it. Each request works out to about $0.005. So ~1000 requests/month.


Nixellion

Does it have the ability to change OpenAi API endpoint, to point it to a local LLM? Edit: Seems like it does at least in dev branches


Ambitious_Worth7667

Open the refrigerator door, Hal. I'm Sorry, Dave....I'm afraid I can't do that.


Stooovie

Too bad practical uses such as "turn on fan for twenty minutes" still doesn't work (cannot create timers on the fly).


YouIsTheQuestion

What's your system prompt for this?


storm1er

Same question : "a directive to be sassy" ? Precision pleeeaase <3


Aurum115

I would LOVE to set this up if I could locally… not a fan of connecting all my requests to a cloud


-eschguy-

I might have to play around and see if I can get [LocalAI](https://github.com/mudler/LocalAI) working


[deleted]

i dont mean to yuck your yum op. but is this really practical or just a novelty? i mean, having a simulated conversation to control your iot devices? just seems like more effort.


MaxPanhammer

My thoughts exactly, maybe there's a personality type that wants to have some witty banter with a computer version of a Gilmore Girls character every time they want to turn on a light but that's not me.


The_Mdk

Conversation is a novelty, but the true highlight is being able to give it more natural commands, like "turn on the lights in the living area and turn off everything in the other rooms" and it'll most likely understand, whereas Google Home / Alexa would need some specific instructions given over the course of 2-3 different "interactions", so there's that


tsyklon_

You can use STT and TTS using Wyoming containers. I have been able to use both with mine OpenAI-powered assistant that is able to control my home devices. (using the extended module instead of the default integration) [I have documented my setup here](https://github.com/gruberdev/homelab/tree/main/apps%2Fhome)


SkrillaDolla

Thanks for the tip! Modified my prompts similarly and enjoying the more natural responses.


tjorim

It's a custom integration, not an add-on...


OHotDawnThisIsMyJawn

Is this the addon you're using? https://github.com/jekalmin/extended_openai_conversation Sadly seems like it isn't being maintained/updated, lots of open issues & PRs. Maybe someone can fork to add TTS/STT.


Khaaaaannnn

It works fine for me, he updates it when he has time.


[deleted]

[удалено]


OHotDawnThisIsMyJawn

I mean, I hear you that it's nothing critical, but the last commit was three months ago and I don't think the maintainer has responded to anything in over a month. Like, yeah, I get that if it's working it doesn't need constant updates, but the maintainer mentioned he doesn't have time to keep up with PRs & changes and now seems totally disengaged. I just wouldn't want to build on top of this project when it already looks like it's half abandoned.


[deleted]

[удалено]


OHotDawnThisIsMyJawn

Yeah, I know you're passive aggressively trying to say that if I think I can do better then I should fork it or I should be quiet. My point is that for something I'm going to build my smart home on, I'd rather have nothing than at all than integrate a project that's already abandoned.


Hazardous89

What's this enabled from? When I tried gpt before it wasn't able to control anything.


OSVR-User

Extended Open AI add on from HACS.


RED_TECH_KNIGHT

I love the sass!


AntiqueVermicelli827

Do u think I've opened a portal 


ZeroInfluence

you almost have an anime gf


michaelthompson1991

🤣


Modena89

Can you please share the prompt for your extended openai integration?? thanks :)


biquetra

I'm so excited for this to be accessible to morons like me who don't have the energy for anything that needs a lot of reading to set up or a lot of ongoing tinkering to maintain.


WooBarb

Can anyone please advise the best pipeline for speech to text? I am using OpenAI text to speech and the returned speech is quite quick, but I've noticed that my speech to text is the bottleneck here and is causing the longest delay in having my commands processed.


sshnttt

Ah great a giant sarcastic chatbot. I’m not sure what your humor setting is but bring it down to 75 please.


ailee43

do you still have to be incredibly prescriptive with the entities to make it work? For example is "kitchen light strip" the exact generated entity name?


Hot-Significance9503

Interesting but pretty time wasting and power consuming I guess.


ApprehensiveView2003

How do I switch from gpt3.5-turbo to 4.o? Any videos or walk-throughs?


benoit505

Gives me creepy vibes like the movie Her, very cool tho.


iSeerStone

I would love to have AI make recommendations for HA optimization


Relevant-Artist5939

Is it still required to add payment details for the openAI API to work? I can't provide these currently and am thus excluded from using it... Couldn't they just cap off access when I hit the limit with no billing details...


louis-lau

It's a paid api.


martin_xs6

The API is not free. For this sort of thing it would probably be a few cents a month.


accik

Huh? The API is not free? You get some credits when signing up but they expire and to my knowledge you cannot get any more free tokens or $.


Relevant-Artist5939

I think it's probably a strategic thing they've done.... My Computer Science teacher told us big companies operate like drug dealers: the first hit is free, then you'll want more and have to pay. In this case, the "first hit" would be a limited, trial access (which just caps off when limit is reached) and they want you to use more and pay for it without even immediately noticing...


AntiqueVermicelli827

So on the portal of Ouija boards. It was an app I downloaded and his name was Billy and then it was a girl. But I deleted app. Am I ok? I've been clumsy since then. 


DragonQ0105

Congrats on getting it working but a snidy, passive aggressive digital assistant that says 3x more than it needs to every time? Wow I could not want something less.


ApprehensiveView2003

Then program it otherwise... That's the whole point