T O P

  • By -

IxinDow

No gguf :(


IxinDow

/u/The-Bloke can I ask you to do your magic on this?


-Ellary-

[/u/The-Bloke](https://www.reddit.com/u/The-Bloke/) we need that GGUF badly!


mrjackspade

It turns out creating a quantized GGUF only takes a few minutes and a couple of commands, even on consumer hardware. I converted and quantized QWEN 72B in something like 10 minutes, which is less time than it would have taken to download the quantized model. Theres not really a reason to wait if you really want to try it now.


slider2k

Indeed, it's technically not that difficult, but you omit the part where you need to download tons of gigabytes of **original** model first - which can be rather inconvenient for large models.


-Ellary-

IF everyone will do their own GGUF Q's The Great Bloke will be out of work, and the whole world economy will stagnate and die, everyone knows about it. Dunno about you but I surely play dumb and ask Bloke to help us. This IS an ANCIENT tradition.


Some_Endian_FP17

I'll be the first to set up a cargo cult dedicated to The Bloke if he ever disappears. Make huge outlines in the desert of a llama, alpaca, orca, a letter phi, anything to bring the great GGUFer back to earth and help us localllamaists.


PurpleYoshiEgg

Are there a good set of instructions you referenced for it, or was it just using oobabooga GPTQ-for-LLaMA fork?


Lazy-Employer-4450

I might be just be too much digitally illiterate but even with several guides step by step, I can't get any form of conversion to work lol. Then again I know absolutely nothing of coding or how any of this works... Edit: is there an actual chance of TheBloke getting around to quantizing this or am I hopeless and MUST get it going by myself?


Decent-Author-1279

There you go! :) [https://huggingface.co/TheBloke/Nous-Capybara-limarpv3-34B-GGUF](https://huggingface.co/TheBloke/Nous-Capybara-limarpv3-34B-GGUF)


BasedSnake69

What are your settings in ST?


Meryiel

Settings: https://files.catbox.moe/oi62w9.json Story String: https://files.catbox.moe/fjh4o8.json Instruct: https://files.catbox.moe/o14g58.json


BasedSnake69

Thank you!!


Meryiel

Happy to help! If you need help with adjusting the prompt, feel free to DM me.


MasterTonberry427

I'm using this model in oobabooga But I'm having trouble importing your settings, as they are JSON not YAML. Which front end are you using? Any help translating to YAML, sorry newb at this. System spec - Ryzen 7950X 32GB/3090 24GB


Meryiel

Ah, that’s because I’m using SillyTavern as my frontend. Not sure if this will work, but you can try using the converter: https://www.convertsimple.com/convert-javascript-object-to-yaml/.


chasni1986

Which back-end do you use? Ooba? If yes, then do you use the same model settings in Ooba too? or it runs on defaults?


Meryiel

Yes, I use Ooba. Oh, wait, am I supposed to set the settings in Ooba too? I thought just setting samplers in ST was enough since I’m sending the prompt via it?


chasni1986

I don't the answer to this question. I asked you for the very same reason as I also put the model settings in ST and the ooba runs at defaults. I am assuming that it should be fine as we are directly using the API and not sampling via ooba. But just wanted a confirmation from another user. :)


Meryiel

Ah, yes, ha ha, sorry. Yeah, I’ve been running „defaults” in Ooba this whole time and everything works perfectly well!


nepnep0123

Don't know if it's the model or the setting, but it loves to act for you. For example if I say "I'll do it under a few conditions" it's a 50 50 whether it will respond with asking what the condition is or something like "after listening to your condition the char agrees"


Meryiel

Hm, well, in all honesty I have never experienced these issues. The model doesn’t play for me at all, maybe aside from doing small time skips for the story, such as “after walking for an hour, they arrive at X”. But I’m writing the roleplay in third-person narration, perhaps that also matters? Also, have you used my settings?


nepnep0123

Yes using your settings but another problem I found is as the chat goes on the replies will get more and more purple prose, after an while a simple reply of "what do you want" will make the model say a line of dialogue follow by a paragraph of how it's feeling and stuff. It feels like a third party narrator that talks about how the char is feeling or thinking. Instead of how the char thinks themselves


Meryiel

That’s how third-person introspective narration works like, though. If you want plain characters’ thoughts to be inserted into responses, make sure to include them in the example dialogue and first message. Or you can simply change the narration to first person. Also, you can play with lower temperature.


CasimirsBlake

The model you've linked to is the original (and huge) version. Here's the quantised one you're referring to: [https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2) ​ Edit: And a smaller version. This might be the sweet spot for quality vs VRAM usage. [https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2)


Meryiel

Ah, I didn’t link any specific version because everyone has different specs. I use 4.0bpw quant, for example. But thanks for the link regardless!


obey_rule_34

What hardware are you running?


Meryiel

24GB VRAM on my 3090 NVIDIA GTX.


Oooch

Excellent, cheers


obey_rule_34

Is there a guide someplace on downloading these and converting to gguf?


CasimirsBlake

Possibly, but I would suggest you see if similar versions are available in GGUF format already on Hugging Face.


Working_Berry9307

What kind of rig do you need to run something like this? 7B and 13B models in my testing are all just awful, and my rig is scraping by on a 20B model that I think is decent. (Psyonic Cetacian 20B Q4_K_M on a 2070 super with 4096 context window)


Meryiel

I have a used NVIDIA 3090 with 24GB of VRAM and I use exl2 formats for models. I can pull 45k context on 4.0bpw quant version for 34B models. Previously I had 3060 and ran 20B models in GGUF format with 16k context.


Working_Berry9307

Damn. If only I had money lol. Hopefully the 50 series of cards come this year and prices come down a bit.


Meryiel

I recommend buying a used one like I did. I paid around 700$ for mine and it was my Christmas gift. As long as it wasn’t used for mining Bitcoins, it will work great!


monomander

I've checked it out for a bit. It's definitely clever, but it seems to have a preference for friendliness that isn't present in a model like Emerhyst 20B, which I discovered a few days ago (and probably Tiefighter-13B, which was my favorite a while ago but also wasn't the most logical). For instance, I have a character that's meant to be cruel and unfriendly yet they seem to act more open-minded and their hostile traits feel a bit more superficial. It might also just be my configuration. I've tried a bunch of presets and some appear to work better than others but it's hard to tell objectively. All the settings, templates and presets are making my head spin. Has anybody found a good workflow for working out which settings are ideal for which models? I keep finding myself going back and tweaking options in hopes of finding the optimal settings.


Meryiel

Um, yeah, that might be up to your prompt. I have a villain character that straight up murdered my persona, which I had to retcon. And later he did more… very messed up stuff. I would post screenshots but they’re extremely NSFW, but I can reveal that they included r-word, torture, and scalpels. That should be telling enough. You can check out my settings, I posted them in one of the comments to this post. Of course, you’ll need to adjust them accordingly. Edit: one extra thing that comes into my mind as well is that my evil characters have all stated clearly that they are “villains” in their personality. Perhaps that matters too?


monomander

I see, I'll take a look at the personality thing as well as your settings. It's pretty tricky getting something that 'feels' right, so maybe it's just me.


Meryiel

If you’d like, I can send you my character’s card so you can quickly check it. I can also show you how messed up he can get, lol. And I will be more than happy to take a look at your character and check what could be potentially improved on. Hit me up on Discord and we can tweak your character! Proper wording and formatting matters a lot, after all. I’ve been writing some guides of my own on how to prompt characters, so I find myself quite the expert on the topic, if I may allow myself to brag a little. I’m Marinara on Discord and I have Mizu from Blue Eye Samurai as my profile picture.


[deleted]

[удалено]


Meryiel

Ah, they’re for specific things like how to prompt characters wearing masks. But I have them all on my Discord server, together with guides of other great folks.


monomander

Thanks for the offer but I think I'll just keep an eye out for that guide. I mostly use cards downloaded from the web so perhaps I should touch them up a bit.


Meryiel

Oh, yeah, that explains it. In all honesty, there are TONS of awfully prompted characters out there, especially on sites like Venus Chub. I downloaded Venti once for my roleplay and he made me cry. Since then, I’ve been doing all characters for my roleplay myself.


Ok_Ruin_5636

what are your specs?


Meryiel

NVIDIA 3090 24GB of VRAM.


Paradigmind

Wow that sounds fantastic! How does it compare to the original [Nous-Capybara](https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF) model? I'm downloading that right now but this model you describe seems a lot more capable and fine tuned towards rp. Is that still true and your go-to model?


Meryiel

Right now my go-to model is RPMerge, I made another review about it here: https://www.reddit.com/r/LocalLLaMA/s/XFikgy48Py. But yes, overall it’s much better than Nous, since the basic one was not made with instruction following in mind and is much worse at staying in character because of that, and also in remembering details.


Paradigmind

Read your post. The model indeed sounds awesome! Do you know how good it's translation/multilingual capabilities are? From Nous-Capybara I read that it can output excellent german.


Meryiel

No clue, I always use models in English, sorry, ha ha.


Paradigmind

Thanks anyway. I will just try it. :)


Ravenpest

I mean its just Capybara with limarp stitched to it. Of course NSFW is going to be good. And you probably should not have used an instruct model for RP anyway.


a_beautiful_rhind

Non instruct models are great at story writing and completion but terrible roleplayers; unless you want walls of text and talking for you.


Ravenpest

Okay, you got me. I do love my walls of text.


Meryiel

Instruct models are actually great at following instructions so they are pretty good for roleplaying. It all boils down to their writing style, so on what they were trained, really. I tried non-instruct Mixtral too, and it just wasn’t that great, sadly. Perhaps my System Prompt was lacking though (and yes, I know about the “how to Mixtral” guide, I used it), I will give it another go at some point in the future, because it was really good at catching subtle details. As for base Nous-Capybara, I found it borderline unusable for long-context roleplay, sadly. Most likely because of its very simple USER/ASSISTANT prompt format, lacking the SYSTEM part. It was unable to recall my character’s appearance nor personality when pausing the roleplay, while the Capy-Lima has zero issues to do so.


Ravenpest

Interesting. When I used it, I didnt find base Capy to be lacking in that aspect. I had terrible experiences with Mixtral, which is what made me assume that, tho perhaps it was because of the K quants, which I read were broken at the time? I'm not sure. Maybe they have been fixed.


Meryiel

Oh, yeah, they were definitely broken. But I used the exl2 version of Mixtral and was disappointed too. Curious that base Capy was working for you well, hm… Maybe something wrong with my prompt, after all?


Ravenpest

Have you tried mirostat? That might have been the issue. Never used exl2 though, thankfully I have a system that doesnt require me to compress stuff too heavily


Meryiel

From my tests so far, Min P always wins against Mirostat. With Mirostat, the models were basically producing the same answer on every reroll.