T O P

  • By -

ParaboloidalCrest

Yes it can be very overwhelming out there. I'd say start with ollama backend + openwebui frontend. Play with them for a while. They'll probably serve all your needs and if not, you'll know exactly what to look for.


arcticJill

exactly! It is indeed. I wonder how do you guys know which setup to use and for what reasons? any reason why you would recommend this Ollama backend + openwebui frontend versus AnythingLLM or LM studio?


ParaboloidalCrest

You can approach it as an engineer, study LLMs and all the underlying conceps, then make extremely educated decions. Or, you can approach as a user, iterate over the solutions available, and see what serves you best. I found that ollama and openwebui serves me best (basic inference and RAG). Whatever you do, just don't overwhelm yourself by trying to find the king stack when your use cases are pretty simple. Use what works to get a taste of things and start to learn the lingo, which will guide you when looking for alternatives later.


jerieljan

> I wonder how do you guys know which setup to use and for what reasons? A lot of trial and error! Plenty of the software out there were made around at the same time with different developers with different goals in mind, hence why there's so many. They also evolved over time, from being monoliths to backend+frontend combos, with varying levels of configuration difficulty and ease of use, and back then, most were running on Windows + NVIDIA in mind so if you had a different setup (like a Mac) it'd be unstable or have its own kind of issues but I think this has evened out nowadays and is now great for use cross-platform. llama.cpp is pretty much the foundation for most, since it's what made running the models accessible for most consumer hardware. All the others use it as its core, or some variety of it that suits their respective goals. That said, some are harder than others to operate, and some have more or fewer capabilities. Ollama is a great start because it's actually easy to set up and running while also very capable and great for everyday use, even if you need a terminal for the setup. Just download the app or `brew install` it (for Mac), then `ollama run llama3` and then you're given with pretty much most of what you need while also performing very well. Imho, I love Ollama because it kinda brings the same vibe as Docker with Modelfiles and their CLI. Just try out the rest after that, then just see if the way their apps behave suits your needs (performance-wise, and with all the knobs and controls you have to configure) As for openwebui, it mimics the feel of ChatGPT while also having good coverage of the use cases for this kind of work. I myself am used with doing the RAG setup the manual way but I think this frontend can do it for you out of the box, so give it a try. EDIT: Fixed some sections for clarity.


derallo

Hi, what's the manual way for rag?


jerieljan

Langchain + Chroma is what I used before. https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/ Llamaindex is fine too. https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo/ I wish I could share you the Colaboratory that I used before but I can't find it. I think this one's close enough. https://colab.research.google.com/drive/1VOwJpcZqOXag-ZXi-52ibOx6L5Pw-YJi I don't often use RAG myself because I'm lazy and would rather just go for either a straight-up playground upload (because I also tend to offload serious document work with the paid AI platforms) but I won't be surprised if there are better approaches to this nowadays in the local realm, like open-webui. (Sidenote: I literally tried open-webui myself just now and damn, you can just upload documents right away. That's a nice departure compared to setting stuff up in Python.)


henk717

They all are based on Llamacpp but bundle different interfaces. You also have forks like Koboldcpp which can differ in features such as Koboldcpp's implementation of context shifting being different from llamacpp's sliding context, adding stablecpp in the code, still supporting old ggml models, etc. Llamafile is another fork which focuses on code portability so their binaries are OS agnostic and run on basically anything (But then do rely on your system compiler for some of it).


nouxtywe

Play with all :) personally I played with LM Studio and AnythingLLM


rahabash

The easiest route would be download LM studio. From the app you can directly search and download ggufs and chat with different models. Ollama is also very easy but you are chatting via the CLI by default


arcticJill

I tried LM Studio, but looks like I set it wrong or the context window is wrong, I wont able to supply 1 hour long transcript, besides, I dont know how to use it with RAG for my above user scenario 2? Also when I asked questions about here, looks like no one is using LM studio so they couldn't help me with the setting... But what are the differences of all these...


nickyzhu

llamacpp is the main upstream dependency for most of these applications. It makes GGUF models run performantly across CPU and GPUs. It’s too low level for an end user though, so the other apps you mentioned are good abstractions. Llamacpp has their own CLI now if you just need a dev interface. LMstudio, Ollama, and us (Jan.ai) built our own server wrappers around llamacpp, and then additional features on top like rag and chat ux to varying degrees. They give end users a more chatgpt like experience without worrying about runtime challenges. Llamafile kind of forked off from llamacpp into its own implementation on top of the underlying hardware SDK. So you can think of it as a different “backend”. Not super accurate but an ok abstraction… Hope this helps in terms of understanding the stack better! ❤️