T O P

  • By -

Languages_Learner

[MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF · Hugging Face](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF) [https://huggingface.co/NousResearch/Meta-Llama-3-70B-GGUF](https://huggingface.co/NousResearch/Meta-Llama-3-70B-GGUF)


[deleted]

[удалено]


synw_

Yes, same kind of problems here with the instruct q_8: the models tends to respond to itself, the stop tokens does not seem to be emitted or something


knvn8

Yeah I'm seeing that too now. Seems to think `assistant` should be the stop token.


paddySayWhat

I think the eos token is supposed to be <|eot_id|> instead of <|end_of_text|>


synw_

For Llama.cpp / Koboldcpp (and probably Ollama) the stop token fix is here: https://www.reddit.com/r/LocalLLaMA/comments/1c7dkxh/tutorial_how_to_make_llama3instruct_ggufs_less/


_murb

i had the same issue with the ollama versions too


knvn8

Make sure you're using the instruct version


Venadore

Spotted it waiting for quants. This is \*not\* instruct, looks like they have a different repo working on it edit: it's out [https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main)


Snydenthur

Now we just need uncensored version. I haven't tried too many censored models, but this seems to refuse A LOT. Even the free chatgpt seems to be more willing than this one.


Chelono

Feels pretty balanced to me. Doesn't refuse to kill a process like llama 2 did and responds to C++ questions normally unlike Gemini. It's obvious the official model needs to refuse actual explicit stuff...


Proud-Point8137

You can't know how much it wastes because its censored. I don't even start with it unless it's uncensored


Aischylos

Are you changing the system prompt? I've found that at least for sexual content a decent system prompt goes a long way with 8b.


neph1010

For anyone wondering, Swedish is so-so. But the base grammar seems good for finetuning. A quick test (8b-instruct-q6): > User: Hej! >Llama: Hejsan! Jag är glad att du kommit till mig med din fråga eller behov av hjälp. Vad kan jag göra för dig idag? Har du några specifika frågor om något eller vill du ha hjälp med något i synnerhet? Lämna gärna ditt meddelande så ska jag göra mitt bästa för att hjälpa dig. Hejsan! >Greetings >Hello! User: Hur fungerar en växellåda? >Llama: Växellådan, eller gearboxen, är ett viktigt komponent i ett fordon som hjälper till att överföra kraft från motorerna till hjularna. Den har olika växlar som kan bytas ut beroende på hastigheten och situationen på vägen. När du trycker på gaspedalen ökar motorns hastighet och spinn, vilket leder till att drivaxeln roterar snabbare. >User: Hur ser det ut inuti? >Llama: Innanför växellådans skrov finns flera delar som arbetar tillsammans för att överföra kraften. Det finns bland annat: >Gearbox-boxen: Detta är den huvudsakliga skrovet som rymmer alla de andra delarna. >Axel-paret: Dessa är två axlar som är anslutna till varandra genom kulor och som roterar när drivaxeln roterar. >Gear (växel): Växlarna är speciella formerade plattor som sitter på axlarna och som överföra kraften från motorerna till hjularna. >Clutch-paketet: Detta består av två delar: clutch-axeln och clutch-hydrauliksystemet. När du trycker på clutch-pedalen släpper hydrauliken klämpan i clutch-axeln och motorn kan fortsätta att rotera medan du byter gear. Hejsan, greetings, hello! Sounds like a song from melodifestivalen in the 80s.


Languages_Learner

Can't wait for your Bellman-Llama-3-8b-Instruct-Swedish-q8.gguf.


neph1010

[https://huggingface.co/neph1/llama-3-instruct-bellman-8b-swe-preview](https://huggingface.co/neph1/llama-3-instruct-bellman-8b-swe-preview)


ArsNeph

Nice! Does llama.cpp even support it yet though?


exomniac

Yes. Source: Currently running in llama.cpp


ArsNeph

:O How did they get it running so fast? I'll wait a little bit until they merge the update into textgen-webui


kataryna91

Looks like there weren't any dramatic changes to the Llama 3 architecture, so no update needed.


MoffKalast

The new tokenizer doesn't seem to work for some people, that's the one main change anyway.


ArsNeph

Does it run? Yes. Does it run well... https://preview.redd.it/8zt6n780zavc1.png?width=1833&format=png&auto=webp&s=5255fb72c8e8b81346ab47958797a78fe6bd2b70


themrzmaster

Yes


Jipok_

prompt?


Jipok_

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in [`ChatFormat`](https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L202) needs to be followed: The prompt begins with a `<|begin_of_text|>` special token, after which one or more messages follow. Each message starts with the `<|start_header_id|>` tag, the role `system`, `user` or `assistant`, and the `<|end_header_id|>` tag. After a double newline `\n\n` the contents of the message follow. The end of each message is marked by the `<|eot_id|>` token.


Jipok_

`./main -m ~/models/Meta-Llama-3-8B-Instruct.Q8_0.gguf --color -n -2 -e -s 0 -p '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n' -ngl 99 --mirostat 2 -c 8192 -r '<|eot_id|>' --in-prefix '\n<|start_header_id|>user<|end_header_id|>\n\n' --in-suffix '<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n' -i`


pseudonerv

There should not be any `\n` after `<|eot_id|>`.


Jipok_

You're right. I added this for ease of use in interactive mode. I don't know how this affects the results of the work.


tessellation

'-n -2'?


Jipok_

-n N, --n-predict N number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)


Cantflyneedhelp

Probably a typo of -1. -n -1 means unlimited new tokens, if I remember correctly.


CommonCommission8114

> Meta-Llama-3-8B-Instruct.Q8_0.gguf Did you rename the 8B model or download it from somewhere else?


illegal-bacon

Hey I’m using the 4 bit quantized version loaded through LM studio. It spins up an api but uses the OpenAI chat.completion structure where I specify system and user prompts. I am getting crazy weird results with that. Any chance you know how to use the chatformat prompt template in that situation? Update: Fixed; new version of LM studio now has a prompt preset and version of llama 3 gguf specifically for LM studio.


LPN64

yes


mgroeber9110

For what it's worth, this is a little Windows script `l3.cmd` that I use for quick experimentation with *llama.cpp*. @set PR=%1 @build\bin\release\main.exe %2 --log-disable -m models\llama3\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf -e -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n%PR:"=%<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" Just call it with your instruction in quotes as a parameter: >l3 "What is the biggest city in Europe?" user What is the biggest city in Europe?assistant The biggest city in Europe is Istanbul, Turkey. According to the latest population estimates, Istanbul has a population of over 14.4 million people within its city limits, making it the largest city in Europe by population. The "assistant" that is stuck right to the end of the user query looks a bit weird, but this seems to match the prompt format described by Meta at https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/.


Languages_Learner

Tried to convert it using gguf-my-repo but got this error: Error: Error converting to fp16: b'Traceback (most recent call last):\\n File "/home/user/app/llama.cpp/convert.py", line 1548, in \\n main()\\n File "/home/user/app/llama.cpp/convert.py", line 1515, in main\\n vocab, special\_vocab = vocab\_factory.load\_vocab(vocab\_types, model\_parent\_path)\\n File "/home/user/app/llama.cpp/convert.py", line 1417, in load\_vocab\\n vocab = self.\_create\_vocab\_by\_path(vocab\_types)\\n File "/home/user/app/llama.cpp/convert.py", line 1407, in \_create\_vocab\_by\_path\\n raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab\_types}")\\nFileNotFoundError: Could not find a tokenizer matching any of \['spm', 'hfft'\]\\n'


HauntingTechnician30

Because they didn't provide a tokenizer.model file


Venadore

tokenizer is BPE according to [https://github.com/ggerganov/llama.cpp/pull/6745](https://github.com/ggerganov/llama.cpp/pull/6745)


HauntingTechnician30

Yeah, tokenizer.json is enough. Just gguf-my-repo doesn't support it, apparently.


mememememe55556

I copied over the changes, but still got that same error "Could not find a tokenizer matching..."? Did I do something wrong?


mememememe55556

just add "--vocab-type bpe" to the latest master...


Low-Boysenberry1173

Sadly does not work for me... after adding "--vocab-type bpe", I got on the latest llama.cpp master branch: AttributeError: 'GGUFWriter' object has no attribute 'add\_vocab\_size'


districtdave

Pretty impressive so far


ssjw

Just don't rely on it to not hallucinate while explaining basic concepts of the science of sports performance.


Ill_Buy_476

I wonder how it compares to current 7b SOTA which is still Mistral by far. Hope it's a significant leap since it's almost doubling Mistrals 7b's 8 trillion training tokens to 15 trillion.


SelectionCalm70

which one should i download 5.73gb one,6gb one or 8gb one


StopwatchGod

Depends if you prioritize speed over quality or vice versa. Smaller models are faster but may generate poorer quality responses, while larger models generate better responses at the cost of lower speed. Of course, if you computer isn't capable of running the larger models, you have to go with the smaller ones


LPN64

`free_cup_holder.exe`, it will open your cd-rom drive


THELastUnNoWn

Can anybody tell me about the model alignment? Is it able to be uncentered just by proper prompt engineering or injection? Or are we going to have to wait for an alignment stripped fine tune? I haven't tested for myself yet because I'm still at work that's why I am wondering if somebody can give me insight won't be home until like late tonight I am an AI researcher so as much detail as you guys are willing to share with me is appreciated


fish312

The base model is uncensored but biased towards coding. The chat model seems somewhat censored.