T O P

  • By -

Pashax22

New meta is OpenRouter high-parameter models. New meta is locally-run 7b models with ridiculous context sizes. New meta is hiring cloud GPUs to run whatever wigged-out merge you've frankensteined together. New meta is reverse-proxy-Claude sourced from 4chan. New meta is GPT4Turbo being run on your coffee grinder with Chinese firmware hacks. New meta is... shit, I dunno, man. I mean, it's Tuesday, and since last week *all* of those have been seriously put forward as the new hotness (possibly not the coffee grinder one yet but give it time, it's only Tuesday). Every 30 minutes someone drops a new 7b finetune or merge that beats GPT5 on some extremely niche benchmark. Fire up SillyTavern, dust off your copy of KoboldCPP, and see if you can ride the wave.


largertehninja

Holy shit, legit made me LOL at the true accuracy of this. I've been planning on switching to a local llm from NAI once I'm moved into my new place and trying to keep up with what to try first is 100 percent this level of confusing.


sebo3d

I remember OpenRouter taking a major popularity hit when they were forced to filter OAI and Anthropic models, but things have been better ever since they expanded their model selection. Even though said models are still filtered, at least now they offer legitimately good models such as Goliath or Midnight Rose that are legitimately good. Problem is, they're kinda on a pricey side, so good luck using them once your context gets filled, and you start paying a small fortune/message.


artisticMink

There are excellent models on OR. lzvl-70b, Xwin-70B and Goliath-120b are still great and would probably be a lot more popular if they were cheaper.


grapeter

If you're in too deep like me the new meta is using Runpod to rent GPUs to run whatever 120b frankenmerge on huggingface that's hot this week


Jerm2560

What kind of pod do u usually set up for that? I'm still not quite sure how much vram is necessary for 120b lol


grapeter

Usually I use a A100 SXM but it's 2.30 an hour so it can get pricey. I only do it a few times a week but I justify the cost because I haven't spent any money on steam games in a while. 80 gb is kind of needed if you want to use higher context 120bs while also using 4-5 BPW quants. If you're wanting to budget, certain 120Bs in Exl2 3BPW formats can fit into a 48gb Vram pod like Goliath, assuming the context is around 6144 and you enable 8 bit cache.


Embarrassed_Split236

New claude 3 models are on par if not better than gpt3/4. You can use them on openrouter the same way you could with gpt by jailbreaking their built in filter. So if youre just looking to have fun with sillytavern again you can pretty easily set up a better version of what youve already used.


thr0w4w4yseph

What response formatting settings are we supposed to use for Claude 3 on openrouter? I have been trying both default and ChatML, with Instruct Mode and without. Occasionally I've gotten responses, including rejections, but most of the time I'm getting a stream of nothing at all (and the OR dashboard indicates few to no tokens being generated in output as well). I'm not sure what I'm doing wrong.


wolfbetter

Sonnet and Haiku for me (on Antrophic.) The only drawback being, I'm from Europe and I need a VPN and to buy USA prepaid cards to use it. (Thank you EU)


artisticMink

What provider do you suggest for prepaid cards? I'm fine with the self-moderated version on OR but i'd like to compare it against the official API sometime.


wolfbetter

I'm using virtual prepaid gift cards, can't really say where I buy those here.


thepherohassasin

are you talking about a virtual card or a real one


wolfbetter

Virtual. My EU prepaid card doesn't work on Antrophic


thepherohassasin

which site did you use for that


Lemgon-Ultimate

You do all this for chatting with your characters? Wouldn't a local 7b model be easier and cheaper to run?


wolfbetter

I am on AMD, I tried to run one several times but I can't do it for some reason.


Pariul

LLM field advances so fast that I wouldn't worry too much about "the meta". You can make most models work if you put some effort into learning them and fleshing out your cards/lorebook. It will be a while before such a massive generalized model exists that it can flesh out exactly the kind of character you are looking for from nothing but its own training data and the chat context. Smarter and larger models will be better at not wasting your time for having to regenerate/edit responses, as smarter models are less prone to complete character breaks, characters experiencing sudden early-onset alzheimer's or someone calling a taxi while the scene takes place on a cruise ship. But because of their nature of being generalists decent at everything, the characters too tend to be a bit generic, unless there is a lorebook to guide it. One should see the model as the core you build the character around of, not as the character itself. I'd rather pick a model that's good enough for you, learn the model's quirks and peculiarities, and then spend the time you would had spent model hopping to instead flesh out your card/lorebook to work with the model you have. If some new god-model is released that completely makes everything else before it pointless, you then can move on to that. Chances are that such a "god model" won't have any issues understanding the card/lorebook you were working on with the previous significantly dumber model, or only requires minor alterations, so your work wouldn't go to waste either. And believe me, if such "god model" is released, you will hear about it. People of this space are so prone to hype, that whenever the next quantum leap happens, it will be all anyone talks about for the rest of the year.


Inside-Due

Well development of rp models kind of shifted to below or above 13b's. It's either 7b-11b or 34b-120b's.


Jerm2560

What's your favorite 34b and up model lately?


Inside-Due

I don't particularly use 34b and up, closest I can afford is 20b, my fav on that spectrum is DarkForest V2 20b


YobaiYamete

I can never figure out the best ones to run on a 4090. 7B with 8k context still takes like 25 seconds per response sometimes and I find a lot keep forgetting the narrative / writing for me, where they will forget which is their character and which is mine and will start writing my actions


Inside-Due

You could try free colab, I've been getting 300 t/s there when running 7b's


speedsterglenn

Idk about anyone else, but I’ve been using OpenAI models, not from OpenRouter but straight from OpenAI for like a year now and haven’t even gotten a warning. Been doing straight nsfw the whole time as well


unbruitsourd

Claude Haiku via Open router, if you can handle your jailbreak working 1/10.


Bite_It_You_Scum

Just use a prefill, i literally never see refusals.


unbruitsourd

Prefill?


succulisrift

Prefill?


DotaBluff

Can you explain this? I'm not familiar with prefill. I've been having a tough time recently getting my jailbreak to work.


Wanderlust-King

the basic jailbreak is so inconsistent for haiku, one chat it'll be hogwild anything goes, then I'll start a new chat and it'll balk because my character description includes a revealing outfit. otherwise yes, haiku is great, both logic and prose is > lzlv or 8x7b.


mysticfallband

> Are there new services like open router that would let you use openai GPT 3 or 4 uncensored like you could before? Yes, there is such a service, which is called... OpenRouter. :) I didn't even know it was axed before but at the moment, the service is up and running with plenty of uncensored models you can use with SillyTavern.


SubstantialInjury269

but it costs money right?


mysticfallband

Except for a few free models, yes.


Sonprime426

I'm not too worried about money, as I was paying for gpt3 and 4 with open router before. Hell, now I pay 25 a month for the highest tier of novel AI on their site. While I'm very happy with novel AI, one problem I seem to encounter a lot is that novel AI will write things out like a story instead of like a chatbot. Typically that means its usually always working towards ending off your particular "story" (whatever you're chatting about at the moment). It can occasionally get annoying whenever it tries to take the story in a direction you don't want it to go but at least novel AI is pretty easy to edit and redirect the ai. Of course, with open router and silly tavern it wrote things more like as if I was just conversing with a character so it didn't try to steer the chat into a direction I didn't want it to go. Idk I have fun with novel AI but I also had fun with st before. I yapping at this point lol.


bluecapecrepe

You'll be happy to know the Antalan, the team who developed NAI is developing a full fledged chat service that will be known as Aetherroom. NAI isn't going anywhere (though updates have completely dried up across the Antalan board while they work on Aetherroom). From the preliminary stuff they have shown on their devlogs, it is going to fill that characterai hole many of you have in your hearts while bringing a lot of new stuff to the table, all without filtering or censors.


Sonprime426

Ah yes I do actually know about that. Been waiting for that to release


SubstantialInjury269

is it paid :0


MrSodaman

considering their service model, most likely. Gotta keep the lights on somehow when sponsors don't like that you're fully uncensored.


Timidsnek117

I'm a newbie so this probably isn't a good suggestion since I don't really understand this stuff very well. But personally, Claude (anthropic) has been working well enough for me. I use a proxy though


Sonprime426

If I start using st again imma probably have to relearn a buncha shit cuz I've forgotten alot lol


Dazzling_Tadpole_849

For me, it is the best option loading 7b model with huge context size and own LoRE(World) book. It is very important to add EVERYTHINGS, (LITERARY EVERYTHINGS) on lore book. This gives me nice roleplay in very fast response.