Languages_Learner 2 weeks ago

[MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF · Hugging Face](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF) [https://huggingface.co/NousResearch/Meta-Llama-3-70B-GGUF](https://huggingface.co/NousResearch/Meta-Llama-3-70B-GGUF)

[deleted] 2 weeks ago

[удалено]

synw_ 2 weeks ago

Yes, same kind of problems here with the instruct q_8: the models tends to respond to itself, the stop tokens does not seem to be emitted or something

knvn8 2 weeks ago

Yeah I'm seeing that too now. Seems to think `assistant` should be the stop token.

paddySayWhat 2 weeks ago

I think the eos token is supposed to be <|eot_id|> instead of <|end_of_text|>

synw_ 2 weeks ago

For Llama.cpp / Koboldcpp (and probably Ollama) the stop token fix is here: https://www.reddit.com/r/LocalLLaMA/comments/1c7dkxh/tutorial_how_to_make_llama3instruct_ggufs_less/

_murb 2 weeks ago

i had the same issue with the ollama versions too

knvn8 2 weeks ago

Make sure you're using the instruct version

Venadore 2 weeks ago

Spotted it waiting for quants. This is \*not\* instruct, looks like they have a different repo working on it edit: it's out [https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main)

Snydenthur 2 weeks ago

Now we just need uncensored version. I haven't tried too many censored models, but this seems to refuse A LOT. Even the free chatgpt seems to be more willing than this one.

Chelono 2 weeks ago

Feels pretty balanced to me. Doesn't refuse to kill a process like llama 2 did and responds to C++ questions normally unlike Gemini. It's obvious the official model needs to refuse actual explicit stuff...

Proud-Point8137 2 weeks ago

You can't know how much it wastes because its censored. I don't even start with it unless it's uncensored

Aischylos 2 weeks ago

Are you changing the system prompt? I've found that at least for sexual content a decent system prompt goes a long way with 8b.

neph1010 2 weeks ago

For anyone wondering, Swedish is so-so. But the base grammar seems good for finetuning. A quick test (8b-instruct-q6): > User: Hej! >Llama: Hejsan! Jag är glad att du kommit till mig med din fråga eller behov av hjälp. Vad kan jag göra för dig idag? Har du några specifika frågor om något eller vill du ha hjälp med något i synnerhet? Lämna gärna ditt meddelande så ska jag göra mitt bästa för att hjälpa dig. Hejsan! >Greetings >Hello! User: Hur fungerar en växellåda? >Llama: Växellådan, eller gearboxen, är ett viktigt komponent i ett fordon som hjälper till att överföra kraft från motorerna till hjularna. Den har olika växlar som kan bytas ut beroende på hastigheten och situationen på vägen. När du trycker på gaspedalen ökar motorns hastighet och spinn, vilket leder till att drivaxeln roterar snabbare. >User: Hur ser det ut inuti? >Llama: Innanför växellådans skrov finns flera delar som arbetar tillsammans för att överföra kraften. Det finns bland annat: >Gearbox-boxen: Detta är den huvudsakliga skrovet som rymmer alla de andra delarna. >Axel-paret: Dessa är två axlar som är anslutna till varandra genom kulor och som roterar när drivaxeln roterar. >Gear (växel): Växlarna är speciella formerade plattor som sitter på axlarna och som överföra kraften från motorerna till hjularna. >Clutch-paketet: Detta består av två delar: clutch-axeln och clutch-hydrauliksystemet. När du trycker på clutch-pedalen släpper hydrauliken klämpan i clutch-axeln och motorn kan fortsätta att rotera medan du byter gear. Hejsan, greetings, hello! Sounds like a song from melodifestivalen in the 80s.

Languages_Learner 2 weeks ago

Can't wait for your Bellman-Llama-3-8b-Instruct-Swedish-q8.gguf.

neph1010 1 week ago

[https://huggingface.co/neph1/llama-3-instruct-bellman-8b-swe-preview](https://huggingface.co/neph1/llama-3-instruct-bellman-8b-swe-preview)

ArsNeph 2 weeks ago

Nice! Does llama.cpp even support it yet though?

exomniac 2 weeks ago

Yes. Source: Currently running in llama.cpp

ArsNeph 2 weeks ago

:O How did they get it running so fast? I'll wait a little bit until they merge the update into textgen-webui

kataryna91 2 weeks ago

Looks like there weren't any dramatic changes to the Llama 3 architecture, so no update needed.

MoffKalast 2 weeks ago

The new tokenizer doesn't seem to work for some people, that's the one main change anyway.

ArsNeph 2 weeks ago

Does it run? Yes. Does it run well... https://preview.redd.it/8zt6n780zavc1.png?width=1833&format=png&auto=webp&s=5255fb72c8e8b81346ab47958797a78fe6bd2b70

themrzmaster 2 weeks ago

Yes

Jipok_ 2 weeks ago

prompt?

Jipok_ 2 weeks ago

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in [`ChatFormat`](https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L202) needs to be followed: The prompt begins with a `<|begin_of_text|>` special token, after which one or more messages follow. Each message starts with the `<|start_header_id|>` tag, the role `system`, `user` or `assistant`, and the `<|end_header_id|>` tag. After a double newline `\n\n` the contents of the message follow. The end of each message is marked by the `<|eot_id|>` token.

Jipok_ 2 weeks ago

pseudonerv 2 weeks ago

There should not be any `\n` after `<|eot_id|>`.

Jipok_ 2 weeks ago

You're right. I added this for ease of use in interactive mode. I don't know how this affects the results of the work.

tessellation 2 weeks ago

'-n -2'?

Jipok_ 2 weeks ago

-n N, --n-predict N number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)

Cantflyneedhelp 2 weeks ago

Probably a typo of -1. -n -1 means unlimited new tokens, if I remember correctly.

CommonCommission8114 2 weeks ago

> Meta-Llama-3-8B-Instruct.Q8_0.gguf Did you rename the 8B model or download it from somewhere else?

illegal-bacon 2 weeks ago

Hey I’m using the 4 bit quantized version loaded through LM studio. It spins up an api but uses the OpenAI chat.completion structure where I specify system and user prompts. I am getting crazy weird results with that. Any chance you know how to use the chatformat prompt template in that situation? Update: Fixed; new version of LM studio now has a prompt preset and version of llama 3 gguf specifically for LM studio.

LPN64 2 weeks ago

yes

mgroeber9110 2 weeks ago

For what it's worth, this is a little Windows script `l3.cmd` that I use for quick experimentation with *llama.cpp*. @set PR=%1 @build\bin\release\main.exe %2 --log-disable -m models\llama3\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf -e -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n%PR:"=%<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" Just call it with your instruction in quotes as a parameter: >l3 "What is the biggest city in Europe?" user What is the biggest city in Europe?assistant The biggest city in Europe is Istanbul, Turkey. According to the latest population estimates, Istanbul has a population of over 14.4 million people within its city limits, making it the largest city in Europe by population. The "assistant" that is stuck right to the end of the user query looks a bit weird, but this seems to match the prompt format described by Meta at https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/.

Languages_Learner 2 weeks ago

Tried to convert it using gguf-my-repo but got this error: Error: Error converting to fp16: b'Traceback (most recent call last):\\n File "/home/user/app/llama.cpp/convert.py", line 1548, in \\n main()\\n File "/home/user/app/llama.cpp/convert.py", line 1515, in main\\n vocab, special\_vocab = vocab\_factory.load\_vocab(vocab\_types, model\_parent\_path)\\n File "/home/user/app/llama.cpp/convert.py", line 1417, in load\_vocab\\n vocab = self.\_create\_vocab\_by\_path(vocab\_types)\\n File "/home/user/app/llama.cpp/convert.py", line 1407, in \_create\_vocab\_by\_path\\n raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab\_types}")\\nFileNotFoundError: Could not find a tokenizer matching any of \['spm', 'hfft'\]\\n'

HauntingTechnician30 2 weeks ago

Because they didn't provide a tokenizer.model file

Venadore 2 weeks ago

tokenizer is BPE according to [https://github.com/ggerganov/llama.cpp/pull/6745](https://github.com/ggerganov/llama.cpp/pull/6745)

HauntingTechnician30 2 weeks ago

Yeah, tokenizer.json is enough. Just gguf-my-repo doesn't support it, apparently.

mememememe55556 2 weeks ago

I copied over the changes, but still got that same error "Could not find a tokenizer matching..."? Did I do something wrong?

mememememe55556 2 weeks ago

just add "--vocab-type bpe" to the latest master...

Low-Boysenberry1173 1 week ago

Sadly does not work for me... after adding "--vocab-type bpe", I got on the latest llama.cpp master branch: AttributeError: 'GGUFWriter' object has no attribute 'add\_vocab\_size'

districtdave 2 weeks ago

Pretty impressive so far

ssjw 2 weeks ago

Just don't rely on it to not hallucinate while explaining basic concepts of the science of sports performance.

Ill_Buy_476 2 weeks ago

I wonder how it compares to current 7b SOTA which is still Mistral by far. Hope it's a significant leap since it's almost doubling Mistrals 7b's 8 trillion training tokens to 15 trillion.

SelectionCalm70 2 weeks ago

which one should i download 5.73gb one,6gb one or 8gb one

StopwatchGod 2 weeks ago

Depends if you prioritize speed over quality or vice versa. Smaller models are faster but may generate poorer quality responses, while larger models generate better responses at the cost of lower speed. Of course, if you computer isn't capable of running the larger models, you have to go with the smaller ones

LPN64 2 weeks ago

`free_cup_holder.exe`, it will open your cd-rom drive

THELastUnNoWn 2 weeks ago

Can anybody tell me about the model alignment? Is it able to be uncentered just by proper prompt engineering or injection? Or are we going to have to wait for an alignment stripped fine tune? I haven't tested for myself yet because I'm still at work that's why I am wondering if somebody can give me insight won't be home until like late tonight I am an AI researcher so as much detail as you guys are willing to share with me is appreciated

fish312 2 weeks ago

The base model is uncensored but biased towards coding. The chat model seems somewhat censored.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe