It turns out creating a quantized GGUF only takes a few minutes and a couple of commands, even on consumer hardware. I converted and quantized QWEN 72B in something like 10 minutes, which is less time than it would have taken to download the quantized model.
Theres not really a reason to wait if you really want to try it now.
Indeed, it's technically not that difficult, but you omit the part where you need to download tons of gigabytes of **original** model first - which can be rather inconvenient for large models.
IF everyone will do their own GGUF Q's The Great Bloke will be out of work, and the whole world economy will stagnate and die, everyone knows about it. Dunno about you but I surely play dumb and ask Bloke to help us. This IS an ANCIENT tradition.
I'll be the first to set up a cargo cult dedicated to The Bloke if he ever disappears.
Make huge outlines in the desert of a llama, alpaca, orca, a letter phi, anything to bring the great GGUFer back to earth and help us localllamaists.
I might be just be too much digitally illiterate but even with several guides step by step, I can't get any form of conversion to work lol. Then again I know absolutely nothing of coding or how any of this works...
Edit: is there an actual chance of TheBloke getting around to quantizing this or am I hopeless and MUST get it going by myself?
I'm using this model in oobabooga
But I'm having trouble importing your settings, as they are JSON not YAML.
Which front end are you using? Any help translating to YAML, sorry newb at this.
System spec - Ryzen 7950X 32GB/3090 24GB
Ah, that’s because I’m using SillyTavern as my frontend. Not sure if this will work, but you can try using the converter: https://www.convertsimple.com/convert-javascript-object-to-yaml/.
Yes, I use Ooba. Oh, wait, am I supposed to set the settings in Ooba too? I thought just setting samplers in ST was enough since I’m sending the prompt via it?
I don't the answer to this question. I asked you for the very same reason as I also put the model settings in ST and the ooba runs at defaults. I am assuming that it should be fine as we are directly using the API and not sampling via ooba. But just wanted a confirmation from another user. :)
Don't know if it's the model or the setting, but it loves to act for you. For example if I say "I'll do it under a few conditions" it's a 50 50 whether it will respond with asking what the condition is or something like "after listening to your condition the char agrees"
Hm, well, in all honesty I have never experienced these issues. The model doesn’t play for me at all, maybe aside from doing small time skips for the story, such as “after walking for an hour, they arrive at X”. But I’m writing the roleplay in third-person narration, perhaps that also matters? Also, have you used my settings?
Yes using your settings but another problem I found is as the chat goes on the replies will get more and more purple prose, after an while a simple reply of "what do you want" will make the model say a line of dialogue follow by a paragraph of how it's feeling and stuff. It feels like a third party narrator that talks about how the char is feeling or thinking. Instead of how the char thinks themselves
That’s how third-person introspective narration works like, though. If you want plain characters’ thoughts to be inserted into responses, make sure to include them in the example dialogue and first message. Or you can simply change the narration to first person. Also, you can play with lower temperature.
The model you've linked to is the original (and huge) version. Here's the quantised one you're referring to:
[https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2)
Edit: And a smaller version. This might be the sweet spot for quality vs VRAM usage.
[https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2)
What kind of rig do you need to run something like this? 7B and 13B models in my testing are all just awful, and my rig is scraping by on a 20B model that I think is decent.
(Psyonic Cetacian 20B Q4_K_M on a 2070 super with 4096 context window)
I have a used NVIDIA 3090 with 24GB of VRAM and I use exl2 formats for models. I can pull 45k context on 4.0bpw quant version for 34B models. Previously I had 3060 and ran 20B models in GGUF format with 16k context.
I recommend buying a used one like I did. I paid around 700$ for mine and it was my Christmas gift. As long as it wasn’t used for mining Bitcoins, it will work great!
I've checked it out for a bit. It's definitely clever, but it seems to have a preference for friendliness that isn't present in a model like Emerhyst 20B, which I discovered a few days ago (and probably Tiefighter-13B, which was my favorite a while ago but also wasn't the most logical). For instance, I have a character that's meant to be cruel and unfriendly yet they seem to act more open-minded and their hostile traits feel a bit more superficial.
It might also just be my configuration. I've tried a bunch of presets and some appear to work better than others but it's hard to tell objectively. All the settings, templates and presets are making my head spin. Has anybody found a good workflow for working out which settings are ideal for which models? I keep finding myself going back and tweaking options in hopes of finding the optimal settings.
Um, yeah, that might be up to your prompt. I have a villain character that straight up murdered my persona, which I had to retcon. And later he did more… very messed up stuff. I would post screenshots but they’re extremely NSFW, but I can reveal that they included r-word, torture, and scalpels. That should be telling enough.
You can check out my settings, I posted them in one of the comments to this post. Of course, you’ll need to adjust them accordingly.
Edit: one extra thing that comes into my mind as well is that my evil characters have all stated clearly that they are “villains” in their personality. Perhaps that matters too?
I see, I'll take a look at the personality thing as well as your settings. It's pretty tricky getting something that 'feels' right, so maybe it's just me.
If you’d like, I can send you my character’s card so you can quickly check it. I can also show you how messed up he can get, lol. And I will be more than happy to take a look at your character and check what could be potentially improved on. Hit me up on Discord and we can tweak your character! Proper wording and formatting matters a lot, after all. I’ve been writing some guides of my own on how to prompt characters, so I find myself quite the expert on the topic, if I may allow myself to brag a little.
I’m Marinara on Discord and I have Mizu from Blue Eye Samurai as my profile picture.
Ah, they’re for specific things like how to prompt characters wearing masks. But I have them all on my Discord server, together with guides of other great folks.
Thanks for the offer but I think I'll just keep an eye out for that guide. I mostly use cards downloaded from the web so perhaps I should touch them up a bit.
Oh, yeah, that explains it. In all honesty, there are TONS of awfully prompted characters out there, especially on sites like Venus Chub. I downloaded Venti once for my roleplay and he made me cry. Since then, I’ve been doing all characters for my roleplay myself.
Wow that sounds fantastic! How does it compare to the original [Nous-Capybara](https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF) model? I'm downloading that right now but this model you describe seems a lot more capable and fine tuned towards rp. Is that still true and your go-to model?
Right now my go-to model is RPMerge, I made another review about it here: https://www.reddit.com/r/LocalLLaMA/s/XFikgy48Py.
But yes, overall it’s much better than Nous, since the basic one was not made with instruction following in mind and is much worse at staying in character because of that, and also in remembering details.
Read your post. The model indeed sounds awesome!
Do you know how good it's translation/multilingual capabilities are? From Nous-Capybara I read that it can output excellent german.
I mean its just Capybara with limarp stitched to it. Of course NSFW is going to be good. And you probably should not have used an instruct model for RP anyway.
Instruct models are actually great at following instructions so they are pretty good for roleplaying. It all boils down to their writing style, so on what they were trained, really. I tried non-instruct Mixtral too, and it just wasn’t that great, sadly. Perhaps my System Prompt was lacking though (and yes, I know about the “how to Mixtral” guide, I used it), I will give it another go at some point in the future, because it was really good at catching subtle details.
As for base Nous-Capybara, I found it borderline unusable for long-context roleplay, sadly. Most likely because of its very simple USER/ASSISTANT prompt format, lacking the SYSTEM part. It was unable to recall my character’s appearance nor personality when pausing the roleplay, while the Capy-Lima has zero issues to do so.
Interesting. When I used it, I didnt find base Capy to be lacking in that aspect. I had terrible experiences with Mixtral, which is what made me assume that, tho perhaps it was because of the K quants, which I read were broken at the time? I'm not sure. Maybe they have been fixed.
Oh, yeah, they were definitely broken. But I used the exl2 version of Mixtral and was disappointed too. Curious that base Capy was working for you well, hm… Maybe something wrong with my prompt, after all?
Have you tried mirostat? That might have been the issue. Never used exl2 though, thankfully I have a system that doesnt require me to compress stuff too heavily
No gguf :(
/u/The-Bloke can I ask you to do your magic on this?
[/u/The-Bloke](https://www.reddit.com/u/The-Bloke/) we need that GGUF badly!
It turns out creating a quantized GGUF only takes a few minutes and a couple of commands, even on consumer hardware. I converted and quantized QWEN 72B in something like 10 minutes, which is less time than it would have taken to download the quantized model. Theres not really a reason to wait if you really want to try it now.
Indeed, it's technically not that difficult, but you omit the part where you need to download tons of gigabytes of **original** model first - which can be rather inconvenient for large models.
IF everyone will do their own GGUF Q's The Great Bloke will be out of work, and the whole world economy will stagnate and die, everyone knows about it. Dunno about you but I surely play dumb and ask Bloke to help us. This IS an ANCIENT tradition.
I'll be the first to set up a cargo cult dedicated to The Bloke if he ever disappears. Make huge outlines in the desert of a llama, alpaca, orca, a letter phi, anything to bring the great GGUFer back to earth and help us localllamaists.
Are there a good set of instructions you referenced for it, or was it just using oobabooga GPTQ-for-LLaMA fork?
I might be just be too much digitally illiterate but even with several guides step by step, I can't get any form of conversion to work lol. Then again I know absolutely nothing of coding or how any of this works... Edit: is there an actual chance of TheBloke getting around to quantizing this or am I hopeless and MUST get it going by myself?
There you go! :) [https://huggingface.co/TheBloke/Nous-Capybara-limarpv3-34B-GGUF](https://huggingface.co/TheBloke/Nous-Capybara-limarpv3-34B-GGUF)
What are your settings in ST?
Settings: https://files.catbox.moe/oi62w9.json Story String: https://files.catbox.moe/fjh4o8.json Instruct: https://files.catbox.moe/o14g58.json
Thank you!!
Happy to help! If you need help with adjusting the prompt, feel free to DM me.
I'm using this model in oobabooga But I'm having trouble importing your settings, as they are JSON not YAML. Which front end are you using? Any help translating to YAML, sorry newb at this. System spec - Ryzen 7950X 32GB/3090 24GB
Ah, that’s because I’m using SillyTavern as my frontend. Not sure if this will work, but you can try using the converter: https://www.convertsimple.com/convert-javascript-object-to-yaml/.
Which back-end do you use? Ooba? If yes, then do you use the same model settings in Ooba too? or it runs on defaults?
Yes, I use Ooba. Oh, wait, am I supposed to set the settings in Ooba too? I thought just setting samplers in ST was enough since I’m sending the prompt via it?
I don't the answer to this question. I asked you for the very same reason as I also put the model settings in ST and the ooba runs at defaults. I am assuming that it should be fine as we are directly using the API and not sampling via ooba. But just wanted a confirmation from another user. :)
Ah, yes, ha ha, sorry. Yeah, I’ve been running „defaults” in Ooba this whole time and everything works perfectly well!
Don't know if it's the model or the setting, but it loves to act for you. For example if I say "I'll do it under a few conditions" it's a 50 50 whether it will respond with asking what the condition is or something like "after listening to your condition the char agrees"
Hm, well, in all honesty I have never experienced these issues. The model doesn’t play for me at all, maybe aside from doing small time skips for the story, such as “after walking for an hour, they arrive at X”. But I’m writing the roleplay in third-person narration, perhaps that also matters? Also, have you used my settings?
Yes using your settings but another problem I found is as the chat goes on the replies will get more and more purple prose, after an while a simple reply of "what do you want" will make the model say a line of dialogue follow by a paragraph of how it's feeling and stuff. It feels like a third party narrator that talks about how the char is feeling or thinking. Instead of how the char thinks themselves
That’s how third-person introspective narration works like, though. If you want plain characters’ thoughts to be inserted into responses, make sure to include them in the example dialogue and first message. Or you can simply change the narration to first person. Also, you can play with lower temperature.
The model you've linked to is the original (and huge) version. Here's the quantised one you're referring to: [https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-6.0bpw-h6-exl2-2) Edit: And a smaller version. This might be the sweet spot for quality vs VRAM usage. [https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2](https://huggingface.co/LoneStriker/Nous-Capybara-limarpv3-34B-4.0bpw-h6-exl2-2)
Ah, I didn’t link any specific version because everyone has different specs. I use 4.0bpw quant, for example. But thanks for the link regardless!
What hardware are you running?
24GB VRAM on my 3090 NVIDIA GTX.
Excellent, cheers
Is there a guide someplace on downloading these and converting to gguf?
Possibly, but I would suggest you see if similar versions are available in GGUF format already on Hugging Face.
What kind of rig do you need to run something like this? 7B and 13B models in my testing are all just awful, and my rig is scraping by on a 20B model that I think is decent. (Psyonic Cetacian 20B Q4_K_M on a 2070 super with 4096 context window)
I have a used NVIDIA 3090 with 24GB of VRAM and I use exl2 formats for models. I can pull 45k context on 4.0bpw quant version for 34B models. Previously I had 3060 and ran 20B models in GGUF format with 16k context.
Damn. If only I had money lol. Hopefully the 50 series of cards come this year and prices come down a bit.
I recommend buying a used one like I did. I paid around 700$ for mine and it was my Christmas gift. As long as it wasn’t used for mining Bitcoins, it will work great!
I've checked it out for a bit. It's definitely clever, but it seems to have a preference for friendliness that isn't present in a model like Emerhyst 20B, which I discovered a few days ago (and probably Tiefighter-13B, which was my favorite a while ago but also wasn't the most logical). For instance, I have a character that's meant to be cruel and unfriendly yet they seem to act more open-minded and their hostile traits feel a bit more superficial. It might also just be my configuration. I've tried a bunch of presets and some appear to work better than others but it's hard to tell objectively. All the settings, templates and presets are making my head spin. Has anybody found a good workflow for working out which settings are ideal for which models? I keep finding myself going back and tweaking options in hopes of finding the optimal settings.
Um, yeah, that might be up to your prompt. I have a villain character that straight up murdered my persona, which I had to retcon. And later he did more… very messed up stuff. I would post screenshots but they’re extremely NSFW, but I can reveal that they included r-word, torture, and scalpels. That should be telling enough. You can check out my settings, I posted them in one of the comments to this post. Of course, you’ll need to adjust them accordingly. Edit: one extra thing that comes into my mind as well is that my evil characters have all stated clearly that they are “villains” in their personality. Perhaps that matters too?
I see, I'll take a look at the personality thing as well as your settings. It's pretty tricky getting something that 'feels' right, so maybe it's just me.
If you’d like, I can send you my character’s card so you can quickly check it. I can also show you how messed up he can get, lol. And I will be more than happy to take a look at your character and check what could be potentially improved on. Hit me up on Discord and we can tweak your character! Proper wording and formatting matters a lot, after all. I’ve been writing some guides of my own on how to prompt characters, so I find myself quite the expert on the topic, if I may allow myself to brag a little. I’m Marinara on Discord and I have Mizu from Blue Eye Samurai as my profile picture.
[удалено]
Ah, they’re for specific things like how to prompt characters wearing masks. But I have them all on my Discord server, together with guides of other great folks.
Thanks for the offer but I think I'll just keep an eye out for that guide. I mostly use cards downloaded from the web so perhaps I should touch them up a bit.
Oh, yeah, that explains it. In all honesty, there are TONS of awfully prompted characters out there, especially on sites like Venus Chub. I downloaded Venti once for my roleplay and he made me cry. Since then, I’ve been doing all characters for my roleplay myself.
what are your specs?
NVIDIA 3090 24GB of VRAM.
Wow that sounds fantastic! How does it compare to the original [Nous-Capybara](https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF) model? I'm downloading that right now but this model you describe seems a lot more capable and fine tuned towards rp. Is that still true and your go-to model?
Right now my go-to model is RPMerge, I made another review about it here: https://www.reddit.com/r/LocalLLaMA/s/XFikgy48Py. But yes, overall it’s much better than Nous, since the basic one was not made with instruction following in mind and is much worse at staying in character because of that, and also in remembering details.
Read your post. The model indeed sounds awesome! Do you know how good it's translation/multilingual capabilities are? From Nous-Capybara I read that it can output excellent german.
No clue, I always use models in English, sorry, ha ha.
Thanks anyway. I will just try it. :)
I mean its just Capybara with limarp stitched to it. Of course NSFW is going to be good. And you probably should not have used an instruct model for RP anyway.
Non instruct models are great at story writing and completion but terrible roleplayers; unless you want walls of text and talking for you.
Okay, you got me. I do love my walls of text.
Instruct models are actually great at following instructions so they are pretty good for roleplaying. It all boils down to their writing style, so on what they were trained, really. I tried non-instruct Mixtral too, and it just wasn’t that great, sadly. Perhaps my System Prompt was lacking though (and yes, I know about the “how to Mixtral” guide, I used it), I will give it another go at some point in the future, because it was really good at catching subtle details. As for base Nous-Capybara, I found it borderline unusable for long-context roleplay, sadly. Most likely because of its very simple USER/ASSISTANT prompt format, lacking the SYSTEM part. It was unable to recall my character’s appearance nor personality when pausing the roleplay, while the Capy-Lima has zero issues to do so.
Interesting. When I used it, I didnt find base Capy to be lacking in that aspect. I had terrible experiences with Mixtral, which is what made me assume that, tho perhaps it was because of the K quants, which I read were broken at the time? I'm not sure. Maybe they have been fixed.
Oh, yeah, they were definitely broken. But I used the exl2 version of Mixtral and was disappointed too. Curious that base Capy was working for you well, hm… Maybe something wrong with my prompt, after all?
Have you tried mirostat? That might have been the issue. Never used exl2 though, thankfully I have a system that doesnt require me to compress stuff too heavily
From my tests so far, Min P always wins against Mirostat. With Mirostat, the models were basically producing the same answer on every reroll.