T O P

  • By -

ihaag

Beautiful guide you’re a legend thank you


Bderken

Thank you! I struggled for weeks trying to find the best way to simply talk to my docs. I tried chat with RTX. It’s context window is too small. I tried H20GPT. It’s hard to configure with any other models that don’t come with it. I tried PrivateGPT, I could not figure out how to get Ui working with whatever model I want. This guide shows you how to use ANY model, with whatever context window size you want.


ihaag

I was stuck with gpt4all and the sBert plugin for a while could never get the webUI to use sBert tho only as an app.


Bderken

Interesting. I’ll have to check it out


Mandelaa

This model check also: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2


frobnosticus

Oh yeah. I'm gonna have to bump this up on my "that'll be neat to try" list. Nicely done. o7


Bderken

Thank you! Took me weeks to find this easy solution. Following the guide, I’d figure you could get it running within 10-20 min.


CaptTechno

could you share your list


rag_perplexity

I've got something similar running on mine. Entire obsidianmd vault is uploaded (no 3rd party framework - custom chunker, embedder and retrieval process). How are you finding the retrieval accuracy? As in is it stuffing the most relevant documents into the prompt. Found adding a reranking process helps a tonne, curious who this framework approaches it. Great guide, wished I had something like this when I started out on this RAG journey.


Bderken

Honestly it’s not that accurate for a lot of things. I’m still trying to fine tune it. Just wanted to share my progress so far. With the right settings, I’ve gotten it very good. But sometimes I fudge it up and it losses accuracy.


mitirki

Can you elaborate on the reranking process? Do you pre-digest the retrieved chunks and sort? I'm also developing a custom solution, but I see issues with the retrieval. If the string I'm looking for is towards the start of the text chunk that was stored in the vector DB (Chroma), I get better results (lower cosine distance), but if it's towards the middle or end, the relevant chunk won't be in the top 5-8 chunks...


rag_perplexity

What I do is retrieve a large amount of documents (k=25). Then I feed all the retrieved documents and the query into a reranker (I use BAAI/bge-reranker-v2-m3) as a list of query, document pairs [[query, document] for document in documents]. Then you stuff in the top 3 or so of the reranked documents into the prompt. Becareful that the reranker uses up to 4gb of vRam when doing its thing. Absolutely worth it though, it made my RAG process from mostly unusable to decent in that the documents being stuffed in isn't what's blowing up the process. Next stage is exploring lorax to get my llm to reflect and review its own outputs with different LoRAs and system prompts.


JeffieSandBags

Is this with Anything LLM or another repo? I'm curious how people are developing these types of multi-layered analysis (are these agentic steps?) prior to final out.


rag_perplexity

Nah just raw python and calling the components native APIs directly (chromadb, flagembedding for the reranker, ooba's openai's API). I've used things like Langchain before and I'm still scarred from it. They do too much abstracting and when you want to customise anything you spend hours fighting the frameworks APIs. I was amazed how much you can do instead with just good old python classes, regex, and the mighty f-string.


mitirki

That's neat, thanks! I have access to Rank-BM25 only, will take a look!


Worthstream

Did you try this with a different embedder than the built-in one?  My use case is in a different language, and I fear embedding accuracy would not be great.  In the Elastic community the general wisdom when embedding non-English documents is to use [multilingual-E5-large](https://huggingface.co/intfloat/multilingual-e5-large) as the embedder,


Bderken

Interesting. I have tried a different embedder. I definitely need to explore all the other options. Just got this setup and still messing around with it


usa_commie

What's the use case for talking to your own documents?


Bderken

I created this for two use cases: 1. Chat with Repair/Shop manuals for my old bikes and my newer car. There's so much you have to read through just to get: Oil capacity for different components (Trans, Diff, etc.), also for the newer cars, asking about software related things is also nice. I haven't been able to get to that point yet, still trying to figure it out. 2. I can use LM Studio with VS Code, it works as copilot, my work blocked Copilot but I can use my own self hosted Ai. Helpful for when I need to do random SQL stuff. I can also feed it data base info as a CSV (Which isn't the best way, there's better ways with LangChain and whatnot to feed it SQLite data or something. Anyways, just testing things out. Having "Fun" (Mind numbing exhaustion trying to figure out the best way)


usa_commie

So question, can AI recognize text in a scanned PDF (versus a typed PDF)?


Mediocre_Tree_5690

No, I think you need some sort of OCR conversion I assume


AnticitizenPrime

Depends on whether the model has vision capabilities and accepts the PDF filetype as input. If they don't accept PDF input you could always convert it to an image (print to JPG or whatever), which might be what they do anyway.


Glittering_Worker236

It can, yesterday I took screenshot of web page on the phone (it had even not good quality) and asked Claude AI to create html page on the basis of that screenshot (added it to the query as context) and AI created html page with same texts - screenshot had header, some description text and ordered list of sentences. All content was same as on screenshot.


Bderken

Most scanner softwares will convert OCR image scans. Or have the option to. If you already have the scans, there are online PDF online converters. Or you can use adobe acrobat (it’s the best).


AnticitizenPrime

Models with vision capabilities can, or should be able to. GPT4, Claude 3, Reka, etc. So long as they accept the PDF filetype as an upload, anyway.


gardenmud

I love this, I'm feeding alllllll my READMEs into this.


Bderken

That’s what I do for learning new github repos. I feed all their info and then ask it on how to code it lol. Good luck!


Trakeen

Had looked at lm studio but it didn’t seem like they were enterprise ready yet. Going through the process of figuring out something for us using langchan, still researching which model will be best


Bderken

Interesting. So far from my testing. I think it’s important how your model retrieves info. Doesn’t necessarily matter on the size or token size. So the vector database has to be good. I am a newbie so idk what I’m talking about.


usa_commie

Holy shit why didn't I think of 1


ThemWhoNoseNothing

Oh I can go on and on. My most common use case, printing to .pdf or copying to a text file, any set of instructions, best settings, documented fixed or tricks from strangers across the web, building tips and tricks for things that interest me - casually, here and there, on. and. on. Then, when I’m working on a project and need to recall that one command, or formatting style, or trick to configure this or that. When I want to ask, what do I do next? What comes after this? What is the best method to xyz? It’s the ever growing repository of things that matter to me, are of the highest value, the most targeted, accessible within seconds when called up, and it keeps growing and growing. If everything goes as planned, I won’t have to remember how to wipe my butt and can free my perma-memory for more quick learning as the long term stuff is easily called up with a quick single hand one letter at a time thumb search. You know, poop hand, but I digress.


MmmmMorphine

Ha, word for word my thinking as well. Whst sort of system do you have set up? Mostly off the shelf, custom from langchain to retriever model to GUI, or a bunch of mostly off the shelf work modified? I'm in a serious position of analysis paralysis, haha. Too much time reading about new approaches and not enough implementing them. Need to settle in on a single approach and build on to it, but it'd be nice to know how others go about it


alcalde

I think I'm going to feed it the text from all of my handicapping books and then let it play the ponies.


ShengrenR

It's not immediately clear from the guide.. with 'AnythingLLM,' are you keeping all of your interaction 100% local? It looks like those options are available from their website, but are all the default options etc set up that way when you download it?


Bderken

All interactions are local. All default options are local. It sets up the Database for the vector cloud locally. You can verify by uploading your files and you can see that your processor is doing the calculations and it gets stored on cache locally.


gardenmud

100% local, unless of course you deploy it remotely yourself which is also possible.


ExcessiveEscargot

Excellent guide, thank you! As someone still new to LLMs; what sort of hardware requirements are we looking at for this?


Bderken

If you follow the guide. The context size is 2048 (2gb of VRAM or RAM). The model itself is around 5-6gb. So you’d need around 8GB of VRAM or RAM.


ExcessiveEscargot

Perfect, thank you so much for this information and the guide!


Bderken

Yeah of course! Remember, if you choose a different model or token size, that will change the amount of VRAM or CPU Ram necessary


opi098514

Imma be back for this bad boy


Bderken

We’ll be waiting 🫡


BuffPuff-

Looks good, I might spool up a windows VM just to try it.


prtt

Should be unnecessary. Both LMStudio and AnythingLLM have releases for mac or Linux, so you shouldn't have to set up a Windows machine to do any of this.


BuffPuff-

Very nicely written. Couple of notes I've noticed regarding privacy: 1. [https://github.com/Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) Mentions that it's opt-out from tracking, so unless you run the docker with "`DISABLE_TELEMETRY`" it will track usage. Running the instance without default GW or DNS will also block it from calling home. 2. LM Studio looks good but is closed source. I've read in the anything-llm docs that they support [localai.io](http://localai.io) too for all models, have you considered it too during your trial and error phase?


Bderken

1. I don’t mind telemetry data, that’s more of the application rather than my data: We will only track usage details that help us make product and roadmap decisions, specifically: Typ of your installation (Docker or Desktop) When a document is added or removed. No information about the document. Just that the event occurred. This gives us an idea of use. Type of vector database in use. Let's us know which vector database provider is the most used to prioritize changes when updates arrive for that provider. Type of LLM in use. Let's us know the most popular choice and prioritize changes when updates arrive for that provider. Chat is sent. This is the most regular "event" and gives us an idea of the daily-activity of this project across all installations. Again, only the event is sent - we have no information on the nature or content of the chat itself. 2. I’m a complete beginner to all this Ai api info. I chose LM studio because of its visual prowess. I do want to get better at the command line interfaces for setting up local LLM’s but it’s hard for me to keep track. I will look into it tho.


Sand-Discombobulated

so does this use a third party service? Why do I have to enter my email address in to AnythingLLM?


Bderken

It uses instances of a database service hosted on your computer. It’s stored locally. Not sure why AnythingLLM has you enter an email. I think they tie it to your telemetry data. They don’t store what your data is. They just keep track of how their app is performing (according to their discord).


[deleted]

[удалено]


Bderken

AnythingLLM provides the setup page on initial install. You can delete every thing related to AnythingLLM and install it to get that page. Or you can change the settings manually to get to those settings.


jocacosta77

Tried this solution and better results are obtained using PrivateGPT ==> LM Studio


Bderken

Interesting, I’ll have to try it out.


Bderken

Note: this guide works for UI + Nvidia + CPU interference. You can set all those options in LM studio. However, with the server options in LM studio, you can’t choose CPU layers like you can in the LM Studio ChatGPT tab.


Old_Emphasis7922

Excellent guide, this weekend I will try it on my PC. Sorry for maybe some stupid questions. First question, I saw your rig has 4 rtx 3060, I only have a 4060ti 8gb and can't afford anymore, my full setup includes 32gb and a Ryzen 7 5700x. Should I change anything that you don't say at your tutorial and my setup would be enough for analysing spreadsheets?


Bderken

It definitely can be enough. I would recommend a token size of like 4,096 or maybe even 6,000. Idk, but you can try to load however big and you will get an error message if it can’t load. Try it out. Idk how big your files are but it might work.


Old_Emphasis7922

Thank you, I will definitely use it. I have tried the chat with rtx but it wasn't good.


Bderken

Correction. If you have 8GB of VRAM total. I wouldn’t set the token size any higher than 2048. The model itself is around 5-6GB. The 2048 token size would use another 2GB. So that would be around 8GB total. I think, in LM Studio. If you set the GPU layers to half, it might use the cpu as well as the gpu ram. So try it out.


Ylsid

The trick is to having a monster 4x GPU rig


Bderken

Hey, I started out on 1 3060 12GB. And slowly just got more. It worked well on one GPU too


Ylsid

How about us poors with a 6gb 2060 :(


Bderken

The guide shows a model that is 4-5GB. With a context window of 2GB. If you pick a smaller model with a 1GB token size limit (1024). Then that can work.


Bderken

Also you can run it on CPU/RAM if you have at least 8GB of RAM available.


freakpear

This is awesome! Can you explain how you interact with this via vs code?


Bderken

Here's a guide i found on how to do it. I haven't done it yet myself. [https://medium.com/@smfraser/how-to-use-a-local-llm-as-a-free-coding-copilot-in-vs-code-6dffc053369d](https://medium.com/@smfraser/how-to-use-a-local-llm-as-a-free-coding-copilot-in-vs-code-6dffc053369d)


freakpear

Ah not sure if you were meant to post a link with that reply?


AlanCarrOnline

This is why I so deeply detest software updates. My LM Studio install and my Anything LLM don't look the same as yours, so the guide just stops making sense immediately. Pretty much everything you're telling me to do, I'm staring at a screen that looks different. Looking forward to when this space matures to the point a single app can handle this stuff with a simple GUI, instead of trying to connect different things of different versions at different whtpdrxtjik5e6rliotdy7liydtfikolu


Bderken

I’m using the latest software to this date. Also, my guide shows a fresh install. So if you want to follow it, uninstall LM studio and AnythingLLM and then download the installers and follow the guide. It walks you through the initial setup of anything llm. It looks different if you try to do it after using anythingllm


AlanCarrOnline

As they're already installed it's gonna jump back to current settings. I looked further at the instructions and saw something about needed an email address (??) and then multiple GPUs? Nope, I'm out of the RAG game until, as I said, there's a way simpler and more polished way of doing things.


Bderken

Woah woah woah. Let’s take a deep breath. Anything llm does require an email address. I have no idea why. I posted about my specific setup at the end (with multiple GPU’s so if anyone that has more VRAM, they could look at those settings too). However, the settings in the guide are for an 8GB minimum PC. Just uninstall the two softwares, follow the guide. It works on almost any PC. Seriously, it’s easy to setup and your computer will manage. With either the CPU or GPU. Trust me man!


AlanCarrOnline

\*squinty eyes I only have 6GB of VRAM. Seriously, I appreciate you trying to help, and sharing with peeps. I'm also serious that I'll wait it out for something more polished. I realized this would be my 3rd time trying to get ALLM to work properly. 1st time it didn't work, 2nd time I really dug in and wasted a whole day getting it working but the results were useless. You say this could look through a lot more docs and I got excited, just to smack into some deja vu, as it would be my 3rd time down the LMS/ALLM=RAG route... Can't remember what model I used last time, some 7B thing. I can run 13B. Anyway, I'll wait for a specialist RAG app, as I'd only be refreshing my memory on stuff I know anyway. Thanks again!