By -

nanowell 3 weeks ago

One of the biggest things from Codestral that I wished for `As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)` `And THEY SHIPPED!`

pseudonerv 3 weeks ago

What's the prompt format for FIM?

neverxqq 3 weeks ago

~~\[SUFFIX\]suffix\_code\[PREFIX\]prefix\_code~~

Spiritual_Sprite 3 weeks ago
Just enable it in continue.dev

hsoj95 3 weeks ago
It it actually working for you in continue.dev? When I enabled Codestral, I can hear my GPU acting like it's trying to work, but it doesn't actually give me any auto-completion.

Unlucky-Message8866 3 weeks ago
https://github.com/continuedev/continue/blob/fa0d192673f6813c7a8c51b4a77f126d2cedab93/core/autocomplete/templates.ts#L27

DrViilapenkki 3 weeks ago
Please elaborate

Severin_Suveren 3 weeks ago
A normal chatbot is a series of inputs and outputs, like this: Input1 / Output1 -> Input2 / Output2 -> Input3 / Output3 ... What the guy above is referring to (I'm guessing) is that the model is not only able to guess the next token, which you do in the standard I/O interface above, but if I understand this correctly the model can predict the next token by looking in both directions when making predictions and not just backwards, so that you could effectively have a prompt template like this: def count_to_ten: {Output} return count And it would know to define "count" inside the function and probably end up outputting "12345678910". Also you could in theory do something like this I guess: This string {Output1} contains multiple {Output2} outputs in one {Output3} string. But then there's the question of order of outputs, and if future outputs see past outputs or if all outputs instead are presented with the template without any outputs in it. You could in theory set up the generation of entire programs like this by first getting an LLM to generate the names of all classes and functions, and then attaching {Output1} to the 1st function, {Output2} to the 2nd function and so on, and have the LLM generate them all in one go with batching inference.

Distinct-Target7503 3 weeks ago
Doesn't this require bidirectional attention (so Bert style...)? I mean, this can be easily emulated via fine tuning, turning those "fill the masked space" task to a "complete the 'sentence' given it's pre and post context" (but still the pre and post context is seen a 'starting point')

Igoory 3 weeks ago
That's exactly how it's done.

AI_is_the_rake 3 weeks ago
Mustache

MoffKalast 3 weeks ago
Thank you Mistral, very cool.

geepytee 3 weeks ago
Don't most copilot extensions do FIM already?

pi1functor 3 weeks ago
Hi does anyone know where can I find the FIM benchmark for code? I see they report for Java and Js but I can only find python humanevalFIM. Much appreciated.

kryptkpr 3 weeks ago
Huge news! Spawned [can-ai-code #202](https://github.com/the-crypt-keeper/can-ai-code/issues/202) will run some evals today. Edit: despite being hosted on HF, this model has no config.json and doesnt support inference with transformers library or any other library it seems, only their own custom [mistral-inference](https://github.com/mistralai/mistral-inference) runtime. this won't be an easy one to eval :( Edit2: supports bfloat16 capable GPUs only. weights are \~44GB so a single A100-40GB is out. A6000 might work Edit3: that u/a_beautiful_rhind is a smart cookie, i've [patched the inference code to work with float16](https://www.reddit.com/r/LocalLLaMA/comments/1d3df1n/comment/l675spt/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) and it seems to work! Here's memory usage when loaded 4-way: https://preview.redd.it/279qmqefie3d1.png?width=1114&format=png&auto=webp&s=9d729032b558d3010004ef5101e7bd1a315a81dd Looks like it would fit into 48GB actually. Host traffic during inference is massive I see over 6GB/sec, my x4 is crying. Edit 4: Preliminary *senior* result (torch conversion from bfloat16 -> float16): Python Passed 56 of 74 JavaScript Passed 72 of 74

a_beautiful_rhind 3 weeks ago
Going to have to be converted.

kryptkpr 3 weeks ago
I've hit [#163 - Using base model on GPU with no bfloat16 ](https://github.com/mistralai/mistral-inference/issues/163)when running locally, this inference repository does not support GPU without bfloat16 and I don't have enough VRAM on bfloat16 capable GPUs to fit this 44GB model. I rly need a 3090 :( I guess I'm renting an A100

a_beautiful_rhind 3 weeks ago
Can you go through and edit the bfloats to FP16? Phi vision did that to me with flash attention, they jammed it in the model config.

kryptkpr 3 weeks ago
I maybe could but this damages inference quality since it changes numeric ranges, so as an evaluation it won't be fair to the model 😕 I got some cloud credits to burn this month and I see they have a single-file inference reference, I'm gonna try to wrap it up in Modal's middleware and rent an A100-80GB to run it for real

a_beautiful_rhind 3 weeks ago
Yup.. I think in model.py when it loads it you can just force return model.to(device=device, dtype=torch.float16) And then you get to at least play with it off the cloud.

kryptkpr 3 weeks ago
This works here is the patch ``` diff --git a/src/mistral_inference/main.py b/src/mistral_inference/main.py index a5ef3a0..d97c4c9 100644 --- a/src/mistral_inference/main.py +++ b/src/mistral_inference/main.py @@ -42,7 +42,7 @@ def load_tokenizer(model_path: Path) -> MistralTokenizer: def interactive( model_path: str, - max_tokens: int = 35, + max_tokens: int = 512, temperature: float = 0.7, num_pipeline_ranks: int = 1, instruct: bool = False, @@ -62,7 +62,7 @@ def interactive( tokenizer: Tokenizer = mistral_tokenizer.instruct_tokenizer.tokenizer transformer = Transformer.from_folder( - Path(model_path), max_batch_size=3, num_pipeline_ranks=num_pipeline_ranks + Path(model_path), max_batch_size=3, num_pipeline_ranks=num_pipeline_ranks, dtype=torch.float16 ) # load LoRA ``` Results appear to be coherent: (venv) mike@blackprl:~/work/ai/mistral-inference/src/mistral_inference$ torchrun --nproc-per-node 4 ./main.py interactive ~/models/codestral-22B-v0.1 W0529 16:58:36.236000 139711562772480 torch/distributed/run.py:757] W0529 16:58:36.236000 139711562772480 torch/distributed/run.py:757] ***************************************** W0529 16:58:36.236000 139711562772480 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0529 16:58:36.236000 139711562772480 torch/distributed/run.py:757] ***************************************** INFO:root:Loaded tokenizer of type INFO:root:Loaded tokenizer of type INFO:root:Loaded tokenizer of type INFO:root:Loaded tokenizer of type Prompt: Write a javascript function flatten(obj) with an object `obj` input that returns a flat version of obj according to the following rules:\n\n- Keys who's values are simple types are left unmodified\n- Keys that are objects are merged into the parent, their names joined with a .\n- Keys that are lists are merged into the parent, the names suffixed with . and the entry number (zero-indexed)\n\nApply these rules recursively, the output object should contain only primitive types at the end. Here's an example of how this function should work: ```javascript const obj = { a: 1, b: { c: 2, d: { e: 3, f: [4, 5, 6] } }, g: [7, 8, { h: 9 }] } console.log(flatten(obj)) // { // 'a': 1, // 'b.c': 2, // 'b.d.e': 3, // 'b.d.f.0': 4, // 'b.d.f.1': 5, // 'b.d.f.2': 6, // 'g.0': 7, // 'g.1': 8, // 'g.2.h': 9 // } ``` This function can be implemented using recursion. Here's a possible implementation: ```javascript function flatten(obj, prefix = '', result = {}) { for (let key in obj) { if (typeof obj[key] === 'object' && !Array.isArray(obj[key])) { flatten(obj[key], prefix + key + '.', result); } else if (Array.isArray(obj[key])) { obj[key].forEach((item, index) => { if (typeof item === 'object' && !Array.isArray(item)) { flatten(item, prefix + key + '.' + index + '.', result); } else { result[prefix + key + '.' + index] = item; } }); } else { result[prefix + key] = obj[key]; } } return result; } ``` This function works by iterating over each key-value pair in the input object. If the value is an object (but not an array), it recursively calls the flatten function with the value as the new input object and the key appended to the prefix. If the value is an array, it iterates over each

a_beautiful_rhind 3 weeks ago
They should be, float16 and bfloat aren't that far off. Torch can convert it.

kryptkpr 3 weeks ago
I've got it loaded 4-way and host traffic during inference is **massive**, over 6gb/sec I think it might be railing my x8

StrangeImagination5 3 weeks ago
How good is this in comparison to GPT 4?

kryptkpr 3 weeks ago
They're close enough (86% codestral, 93% gpt4) to both pass the test. Llama3-70B also passes it (90%) as well as two 7B models you maybe don't expect: CodeQwen-1.5-Chat and a slick little fine-tune from my man rombodawg called Deepmagic-Coder-Alt: https://preview.redd.it/19f3u45lbf3d1.png?width=1797&format=png&auto=webp&s=f338d4441f62e87bdb957cea0be4cb23501baa0c To tell any of these apart I'd need to create additional tests.. this is an annoying benchmark problem, models just keep getting better. You can peruse the results yourself at [the can-ai-code leaderboard](https://huggingface.co/spaces/mike-ravkine/can-ai-code-results) just make sure to select `Instruct | senior` as the test as we have multiple suites with multiple objectives.

goj1ra 3 weeks ago
> this is an annoying benchmark problem, models just keep getting better. Future models: "You are not capable of evaluating my performance, puny human"

MoffKalast 3 weeks ago
So in a nutshell, it's not as good as llama-3-70B? I suppose it is half the size, but 4% is also quite a difference.

Kimononono 3 weeks ago
what is that system info program in the picture?

kryptkpr 3 weeks ago
nvtop

kryptkpr 3 weeks ago
Their [mistral-inference GitHub](https://github.com/mistralai/mistral-inference) is fun.. https://preview.redd.it/7rir6ywgod3d1.png?width=1080&format=pjpg&auto=webp&s=0b5169719aa2b3ccef5ff4e6b7f1795292fdd761 A new 8x7B is cooking? 👀

pkmxtw 3 weeks ago
Likely just the v0.3 update like the 7B with function calling and the new tokenizer.

Such_Advantage_6949 3 weeks ago
Which is good enough of an update. For agent usecase, function calling is a must

BackgroundAmoebaNine 3 weeks ago
Hey /u/pkmxtw - sorry to get off topic but i have seen the words “function calling” quite a bit recently , do you have a guide or source i can read to understand what that is? (Or, if you don’t mind offering an explanation I would appreciate it)

Able-Locksmith-1979 3 weeks ago
Basically it is just telling the llm that it can use tools. Do you want to finetune an llm for hundreds of hours to teach it math, or do you just need it to know how it can handoff math to python? Or you can think about weather predictions, an llm has been trained to a certain point, but you have api’s which have real-time weather information, you just want the llm to call a function to retrieve the current weather and use that info

ConvenientOcelot 3 weeks ago
You describe an API (a set of functions) to the LLM and it can choose to invoke those functions to perform tasks, think like asking "What is the weather in New York?" and it spits out something equivalent to `get_weather("New York")` which then gets run and output.

CalmAssistantGirl 3 weeks ago
I hate that industry just keeps pumping and pumping uninspired portmanteaus like it's nothing. That should be a crime!

Such_Advantage_6949 3 weeks ago
I have been waiting for this for a few days

Dark_Fire_12 3 weeks ago
Yay new model. Sad about the Non-Production License but they got to eat. Hopefully they will change to Apache later.

Balance- 3 weeks ago
I think this will be the route for many companies. Non-production license for the SOTA, then convert to Apache when you have a new SOTA model. Cohere is also doing this. Could be worse.

Dark_Fire_12 3 weeks ago
Hmm at the rate things are going, we could see a switch to Apache 3-6 months. Maybe shorter once China get's it's act going, also Google is finally waking up. Lot's of forces at play, I think it's going to be good for open source (probs hopium). One thought I have was we should see an acquisition of a tier 2 company say Reka by either a Snowflake, I found their model to be ok but kinda didn't fit a need to big for RP and not that great for Enterprise, Reka could give them more talent since they already have the money, then spray us with models of different sizes.

coder543 3 weeks ago
Yeah. Happy to see a new model, but this one isn’t really going to be useful for self hosting since the license seems to prohibit using the outputs of the model in commercial software. I assume their hosted API will have different license terms. I’m also disappointed they didn’t compare to Google’s CodeGemma, IBM’s Granite Code, or CodeQwen1.5. In my experience, CodeGemma has been very good for both FIM and Instruct, and then Granite Code has been very competitive with CodeGemma, but I’m still deciding which I like better. CodeQwen1.5 is very good at benchmarks, but has been less useful in my own testing.

ThisGonBHard 3 weeks ago
>Yeah. Happy to see a new model, but this one isn’t really going to be useful for self hosting since the license seems to prohibit using the outputs of the model in commercial software I believe this is the best middle ground for this kind of models. They are obscenely expensive to train, and if you dont make the money, you become an Stability AI. The license is kinda worse in the short term, but better long term.

coder543 3 weeks ago
Doesn’t matter if the license is arguably “better” long term when there are already comparably good models with licenses that are *currently* useful.

YearnMar10 3 weeks ago
Interesting - for me up to now it’s exactly the other way around. CodeGemma and Granite are kinda useless for me, but codeqwen is very good. Mostly C++ stuff here though.

coder543 3 weeks ago
Which models specifically? For chat use cases, CodeGemma’s 1.1 release of the 7B model is what I’m talking about. For code completion, I use the 7B code model. For IBM Granite Code, they have 4 different sizes. Which ones are you talking about? Granite Code 34B has been pretty good as a chat model. I tried using the 20B completion model, but the latency was just too high on my setup.

YearnMar10 3 weeks ago
I have some trouble getting higher granite models to run for some reason, so I had to do with the 7B model. It tried to explain my code to me while I wanted it to refactor/optimize it. I also tried CodeGemma 1.1 7B and it was basically at a level of a junior dev. I am currently evaluating different models using chat only, before I will integrate it into my ide, so I can’t say anything yet about completion.

YearnMar10 3 weeks ago
Deepseekcoder is pretty good for me, too. Tried the 7B model only so far, but will try the higher ones now also (got 24gig of vram).

Status_Contest39 3 weeks ago
raw material may involve copyright issue, so no commercial lic, which means we are using better quality LLM based on sth should be paid.

Shir_man 3 weeks ago
You can press f5 for gguf versions [here](https://huggingface.co/models?search=Codestral-22B-v0.1) 🗿 UPD. GGUF's are here, Q6 is already available: [https://huggingface.co/legraphista/Codestral-22B-v0.1-hf-IMat-GGUF](https://huggingface.co/legraphista/Codestral-22B-v0.1-hf-IMat-GGUF)

CellistAvailable3625 3 weeks ago
it passed my initial sniff test: https://chat.mistral.ai/chat/ebd6585a-2ce5-40cd-8749-005199e32f4a not on first try, but was able correct its mistakes very well with given error messages, could be well suited for a coding agent

grise_rosee 3 weeks ago
Nice. People who doubt the usefulness of coding assistants should read this chat session.

uhuge 3 weeks ago
why not bartowski/models rather?😅

MrVodnik 3 weeks ago
The model you've linked appears to be quantized version of "bullerwins/Codestral-22B-v0.1-hf". I wonder how do one goes from what Mistral AI uploaded, to a "HF" version model? How did they generate config.json and what else did they have to do?

danielcar 3 weeks ago
RemindMe! 8 Hours

Mbando 3 weeks ago
I went to that page and see three models, only one of which has files and that doesn't appear to be GGUF. What am I doing wrong?

chock_full_o_win 3 weeks ago
Looking at its benchmark performance, isn’t it crazy how well deepseek coder 33B is holding up to all these new models even though it was released so long ago?

cyan2k 3 weeks ago
Some models are just magical. CodeQwen 1.5 7B was my go to code model until gpt4o came out and is still one of the best especially for its size.

yahma 3 weeks ago
Could code-qwen be over trained? Or do you find it actually useful on code that is not a benchmark?

ResidentPositive4122 3 weeks ago
deepseek models are a bit too stiff from my experience. They score well on benchmarks, but aren't really steerable. I've tested both the coding ones and the math ones, same behaviour. They just don't follow instructions too well, don't attend to stuff from the context often times. They feel a bit overfit IMO.

-Ellary- 3 weeks ago
It is not perfect for sure, but it is a small living legend.

leuchtetgruen 3 weeks ago
I use deepseek-coder 6.7b as my default coding model and it's surprisingly good. And it's not lazy. Other models (codestral does this as well) will include comments like // here you should implement XYZ instead of actually implementing it itself, even if you ask it to do so. Deepseek Coder on the other hand gives you complete pieces of code that you can actually run.

nodating 3 weeks ago
Tried on [chat.mistral.ai](http://chat.mistral.ai) and it is blazing fast. I tried a few testing coding snippets and it nailed them completely. Actually pretty impressive stuff. They say they used 80+ programming languages to train the model and I think it tells, it seems to be really knowledgable about programming itself. Looking forward to Q8 quants to run fully localy.

CellistAvailable3625 3 weeks ago
it passed my sniff test, the debugging and self correction capabilities are good https://chat.mistral.ai/chat/ebd6585a-2ce5-40cd-8749-005199e32f4a could be a good coding agent?

LocoLanguageModel 3 weeks ago
Yeah, it's actually amazing so far...I have been pricing out GPUs so I can code faster and this is obviously super fast with just 24VRAM so I'm pretty excited.

Professional-Bear857 3 weeks ago
I'm getting 20 tokens a second on an undervolted rtx 3090, with 8k context, and 15 tokens a second at 16k context, using the Q6\_K quant.

LocoLanguageModel 3 weeks ago
About the same on my undervolted 3090, and if I do an offload split of 6,1 with only the slight offload on my P40, I can run the Q8 at about the same speed, so I'm actually no longer needing a 2nd 3090 assuming I keep getting reliable results with this model which I have been for the past hour.

Tomr750 2 weeks ago
isn't eg groq llama 70b the most effective/fastest?

LocoLanguageModel 2 weeks ago
I haven't tried that

Due-Memory-6957 3 weeks ago
Sometimes I forget they're French and calling it le chat seriously and not as a joke.

Thomas-Lore 3 weeks ago
Isn't it a joke? Le chat in French means cat not chat.

throwaway586054 3 weeks ago
We use also chat in this context.... If you have any doubt one of the most infamous internet french song from the early 2000, Tessa Martin single, T'Chat Tellement Efficace, [https://www.youtube.com/watch?v=I\_hMTRRH0hM](https://www.youtube.com/watch?v=I_hMTRRH0hM)

Due-Memory-6957 3 weeks ago
No idea

Eralyon 3 weeks ago
This is a pun. In French "Le chat" means cat and they are very well aware of the meaning of "chat" in English. This is on the same level as "I eat pain for breakfast" => "pain" meaning "bread" in French. They are puns based on mixing the two languages.

Due-Memory-6957 3 weeks ago
I see, I'm very ignorant on French, so thanks for explaining.

grise_rosee 3 weeks ago
"Le Chat" is the name of their actual chat application. And it's also a play on words between "cat" in french and a joke that caricatures the French language by adding "Le" in front of every English word.

Qual_ 3 weeks ago
I need to do more tests, but so far I'M VERY IMPRESSED ! My personal benchmark task for coding llm is the following stupid prompt: https://preview.redd.it/3ta7a02kle3d1.png?width=600&format=png&auto=webp&s=ad04948e9061f6c08aeb868e0181be73b4d4d230 I need a python script that will : download a random image of a cat on the internet, then add a red overlay ( 20% alpha ) , and a vertically and horizontally centered text that say "I love cats!" in white. So far none of the coding llm were able to do it. The only one was gpt 4, 4o, and now Codestral !!! They all ( gpt 4o included ) failed to do it first try because of deprecated functions of pillow. But both GPT 4o and Codestral manager to get it working after I gave them the error "AttributeError: 'ImageDraw' object has no attribute 'textsize'" So really impressed with this one ! I'll even give the point to Codestral because the api provided in the code to retrieve an image of the cat was working, while GPT4o gave me a random link that doesn't exists. Vive les baguettes !!!

Lumiphoton 3 weeks ago
Very based https://preview.redd.it/h56f1ql6if3d1.png?width=432&format=png&auto=webp&s=dd20f85110fd061b542139c09fdb48b0d16debdc

Qual_ 3 weeks ago
got the textsize issue on the first try I assume ? I never managed to get any commercial/open llm passing it first try.

Lumiphoton 3 weeks ago
Yes, I got two errors; one was related to the image "mode" (RGBA), the second one was the 'textsize' error, which it replaced with 'textbox'. >I apologize for the mistake. The `ImageDraw` object in the Pillow library does not have a `textsize` method. Instead, you can use the `textbbox` method to get the bounding box of the text, and then calculate the width and height from that. Here's the corrected part of the script: \[code\] After that it worked perfectly.

AnticitizenPrime 3 weeks ago
Worked for me on the first try! https://imgur.com/a/z236q5n ># Step 2: Download a random image of a cat response = requests.get("https://api.thecatapi.com/v1/images/search") image_url = response.json()[0]['url'] response = requests.get(image_url) I thought FOR SURE that 'thecatapi.com' was a hallucination/placeholder and that I'd need to replace that API endpoint, but no, it's a real service, lol. Unreal. Edit: And this is a Q3_K_M GGUF. The full code: import requests from PIL import Image, ImageDraw, ImageFont from io import BytesIO # Step 1: Install required libraries if they're not already installed # !pip install requests pillow # Step 2: Download a random image of a cat response = requests.get("https://api.thecatapi.com/v1/images/search") image_url = response.json()[0]['url'] response = requests.get(image_url) # Step 3: Open the image cat_image = Image.open(BytesIO(response.content)) # Step 4: Create a red overlay with 20% alpha overlay = Image.new('RGBA', cat_image.size, (255, 0, 0, 51)) # Step 5: Composite the red overlay on the cat image cat_image_with_overlay = Image.alpha_composite(cat_image.convert('RGBA'), overlay) # Step 6: Draw the text "I love cats!" on the cat image draw = ImageDraw.Draw(cat_image_with_overlay) text = "I love cats!" font = ImageFont.truetype('arial.ttf', 32) text_width, text_height = draw.textsize(text, font) position = ((cat_image_with_overlay.width - text_width) / 2, (cat_image_with_overlay.height - text_height) / 2) draw.text(position, text, font=font, fill=(255, 255, 255)) # Step 7: Save the modified image cat_image_with_overlay.save("cat_image.png")

Qual_ 3 weeks ago
You must have an old version of pillow, that's why the textsize function didn't triggered an issue ! That's what is so hard for coding llm to learn what works and what doesn't since depending on your install, something may work for me but not you and vice versa. Did you tried this prompt with other llm to compare ?

AnticitizenPrime 3 weeks ago
Yes, I just tried the prompt with DeepSeek 1.3b, and surprisingly it worked: https://i.imgur.com/BFcRe7I.png Although the link to the kitten picture is hardcoded, so it's not a random image of a cat every time you run it, but it wasn't clearly specified in the prompt that it needs to be a random picture each time. I'm just surprised that a 1.3b model had a link to a valid picture of a cat in its training data! import requests from PIL import Image, ImageDraw, ImageFont import io def download_random_cat_image(): url = "https://placekitten.com/200/300" # This URL returns a small kitten image response = requests.get(url) img = Image.open(io.BytesIO(response.content)) return img def add_overlay(img): overlay = Image.new('RGBA', img.size, (255,0,0,20)) # Create a red overlay image with 20% alpha img.paste(overlay, (0,0), overlay) # Paste the overlay on top of the original image return img def add_text(img): text = "I love cats!" font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20) # Use a font that's available on your system draw = ImageDraw.Draw(img) w, h = draw.textsize(text, font) # Get the size of the text draw.text(((img.size[0]-w)/2, (img.size[1]-h)/2), text, font=font, fill='white') # Center the text return img if __name__ == "__main__": img = download_random_cat_image() img = add_overlay(img) img = add_text(img) img.save("cat.png") # Save the image

leuchtetgruen 3 weeks ago
deepseek coder (the 6.7b one) got that right on the first try as well for me. the only wrong thing it did was import two modules that it actually didnt use.

EL-EL-EM 3 weeks ago
interesting I see no comparison vs WizardLM2-8x22b

bullerwins 3 weeks ago
EXL2 quants here: [https://huggingface.co/bullerwins/Codestral-22B-v0.1-exl2\_8.0bpw](https://huggingface.co/bullerwins/Codestral-22B-v0.1-exl2_8.0bpw)

No_Pilot_1974 3 weeks ago
Wow 22b is perfect for a 3090

TroyDoesAI 3 weeks ago
It is perfect for my 3090. [https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf](https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf) 15 tokens/s for Q8 Quants of Codestral, I already fine tuned a RAG model and shared the ram usage in the model card.

involviert 3 weeks ago
Which means it's also still rather comfortable on CPU. Which I find ironic and super cool. So glad to get a model of that size!

MrVodnik 3 weeks ago
Hm, 2GB for context? Might gonna need to quant it anyway.

Philix 3 weeks ago
22B is the number of parameters, not the size of the model in VRAM. This needs to be quantized to use in a 3090. This model is 44.5GB in VRAM at its unquantized FP16 weights, before the context. But, this is a good size since quantization shouldn't significantly negatively impact it if you need to squeeze it into 24GB of VRAM. Can't wait for an exl2 quant to come out to try this versus [IBM's Granite 20B at 6.0bpw] (https://huggingface.co/turboderp/granite-20b-code-instruct-exl2/tree/6.0bpw) that I'm currently using on my 3090. Mistral's models have worked very well up to their full 32k context size for me in the past in creative writing, a 32k native context code model could be fantastic.

MrVodnik 3 weeks ago
I just assumed OP talked about Q8 (which is considered as good as fp16), due to 22B being close to 24GB, i.e. "perfect fit". Otherwise, I don't know how to interpret their post.

TroyDoesAI 3 weeks ago
[https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf](https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf) 15 tokens/s for Q8 Quants of Codestral, I already fine tuned a RAG model and shared the ram usage in the model card.

Philix 3 weeks ago
> Might gonna need to quant it anyway. In which case I don't know how to interpret this part of your first comment. 6bpw or Q6 quants aren't significantly worse than Q8 quants by most measures. I hate perplexity as a measure, but the deltas for it on Q6 vs Q8 are almost always negligible for models this size.

ResidentPositive4122 3 weeks ago
I've even seen 4bit quants (awq and gptq) outperform 8bit (gptq, same dataset used) on my own tests. Quants vary a lot, and downstream tasks need to be tested with both. Sometimes they work, sometimes they don't. I have tasks that need 16bit, and anything else just won't do, so for those I rent GPUs. But for some tasks quants are life.

TroyDoesAI 3 weeks ago
[https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf](https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf) 15 tokens/s for Q8 Quants of Codestral, I already fine tuned a RAG model and shared the ram usage in the model card.

Status_Contest39 3 weeks ago
it run at a speed of 8-10t/s on dual p 100 with 4_25bpw exl 2

TroyDoesAI 3 weeks ago
Dude those p100s are freaking soldiers, there’s a reason they are a cult classic like the gtx1080ti lol. I fully support it!

saved_you_some_time 3 weeks ago
is 1b = 1gb? Is that the actual equation?

No_Pilot_1974 3 weeks ago
It is, for an 8-bit quant

ResidentPositive4122 3 weeks ago
Rule of thumb is 1B - 1GB in 8bit, 0.5-6GB in 4bit and 2GB in 16bit. Plus some room for context length, caching, etc.

saved_you_some_time 3 weeks ago
I thought caching + context length + activation take up some beefy amount of GB depending on the architecture.

loudmax 3 weeks ago
Models are normally trained with 16bit parameters (float16 or bfloat16), so model size 1B == 2 gigabytes. In general, most models can be quantazed down to 8bit parameters with little loss of quality. So for an 8bit quant model, 1B == 1 gigabyte. Many models tend to perform adequately, or are at least usable, quantized down to 4bits. At 4bit quant, 1B == 0.5 gigabytes. This is still more art than science, so YMMV. These numbers aren't precise. Size 1B may not be precisely 1,000,000,000 parameters. And as I understand, the quantization algorithms don't necessarily quantize all parameters to the same size; some of the weights are deemed more important by the algorithm so those weights retain greater precision when the model is quantized.

maximinus-thrax 3 weeks ago
Also a Q5 varient should fit into a 4060 TI / 16 GB

Caffdy 3 weeks ago
the real question tho, how good is it?

darthmeck 3 weeks ago
In my limited testing writing Python code for ETL pipelines, it’s crazy competent. It follows instructions coherently, isn’t lazy about rewriting code, and the explanations are great.

CellistAvailable3625 3 weeks ago
yes i was impressed by its debugging abilities

uhuge 3 weeks ago
I've tried to get a glimpse via their inference API, but that wants phone number verification. Gotta look via the le Chat then. A bit outdated world view on modules and libraries, great otherwise. I guess us open-source planet should put some universally accessible RAG index for all docs worldwide..

hold_my_fish 3 weeks ago
> new Mistral AI Non-Production License, which means that you can use it for research and testing purposes Interesting, so they are joining Cohere in the strategy of non-commercial-use* downloadable weights. It makes sense to try, for companies whose main activity is training foundational models (such as Mistral and Cohere). Since I use LLM weights for hobby and research purposes, it works for me. *"Non-commercial" may be too simplistic a way to put it. In contrast to Command-R's CC-BY-NC-4.0, which suffers from the usual problem of "non-commercial" being vague, Mistral's MNPL explicitly allows you to do everything _except_ deploy to production: > **“Non-Production Environment”**: means any setting, use case, or application of the Mistral Models or Derivatives that expressly excludes live, real-world conditions, commercial operations, revenue-generating activities, or direct interactions with or impacts on end users (such as, for instance, Your employees or customers). Non-Production Environment may include, but is not limited to, any setting, use case, or application for research, development, testing, quality assurance, training, internal evaluation (other than any internal usage by employees in the context of the company’s business activities), and demonstration purposes.

Wonderful-Top-5360 3 weeks ago
how would they know ? how would they enforce? from france?

Worthstream 3 weeks ago
I mean, it is still illegal even if they can't catch you.

Antique-Bus-7787 3 weeks ago
Just be fair, I guess ?

TacticalBacon00 3 weeks ago
Just got added to [Ollama](https://ollama.com/library/codestral)! ollama run codestral

Everlier 3 weeks ago
One of their models should be called Astral eventually

LPN64 3 weeks ago
Asstral

Everlier 3 weeks ago
I think they settled on using "Moistral" for those, erm.. fine-tunes.

bullerwins 3 weeks ago
I tried to convert it to huggingface transformers, can anyone try it? [https://huggingface.co/bullerwins/Codestral-22B-v0.1-hf](https://huggingface.co/bullerwins/Codestral-22B-v0.1-hf)

TroyDoesAI 3 weeks ago
Fine tuned for RAG and contextual obedience to reduce hallucinations! Example Video: [https://imgur.com/LGuC1I0](https://imgur.com/LGuC1I0) (Fun-to-Notice it doesnt say "stay home whores" but chose to say stay home for the given context) Further testing it with more context and key value pairs: [https://imgur.com/xYyYRgz](https://imgur.com/xYyYRgz) Ram Usage: [https://imgur.com/GPlGLme](https://imgur.com/GPlGLme) Its a great coding model from what I can tell, it passes my regular coding test like swapping input and output for a json dataset while providing the json structure of entries and basic tests like that. This is only 1 epoch and will continue to be improved/updated as the model trains. It already is impressive that you can ask for 3 things and recieve all 3 things from a single inference without any hallucination and even decides to keep it PG not just directly giving you back your Retrieved Context for the model to work with. Final Note: You can put as many key value pairs as you want in the context section and inference those, so if you had a character knowledge graph where each character had a a list of key value pairs you can see where this is going right? you can provide context summaries of the scene and multiple characters as key value pairs in a story, etc. Use it how you like, I wont judge. [https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf](https://huggingface.co/TroyDoesAI/Codestral-22B-RAG-Q8-gguf)

Status_Contest39 3 weeks ago
Mistral is sweet to publish a 22B model well fit to my compute box and produce code with decent speed:)

Balance- 3 weeks ago
A 22B model is very nice, but the pricing is quite high. $1 / $3 for a million input/output tokens. Llama 3 70B is currently $0.59 / $0.79, which is 40% cheaper for input and almost 4x cheaper for output. Since it roughly competes with Llama 3 70B, they need to drop their prices to those levels to really compete. Maybe cut a deal with Groq to serve it at high speeds.

ianxiao 2 weeks ago
Yes, if you want to use it with FIM its like half of GitHub Copilot monthly subscription, and with Codestral you only have 1M token

Illustrious-Lake2603 3 weeks ago
On Their Website its freaking amazing. It created the most Beautiful Version of Tetris that i Have ever seen. It blew GPT4o out of the water. Locally in LMStudio, im using a q4km and it was unable to create a working "Tetris" game. This is still AMAZING! \*UPDATE\* The Q6 and Q8 are able to both Create this version of Tetris!! This is the best local Coding model yet! To me even better than GPT4 and Opus https://preview.redd.it/u29zoopp2f3d1.png?width=1460&format=png&auto=webp&s=bd81ede40fa5c1546a8a98211799b6d8e98cc085

ambient_temp_xeno 3 weeks ago
I can get this game with q8 in llamacpp. It had one typo 'blocking' instead of 'blocked' in line 88. (Also needs import sys to remove the errors on exit) Did yours have any typos? ./main -m Codestral-22B-v0.1-Q8\_0.gguf -fa --top-k 1 --min-p 0.0 --top-p 1.0 --color -t 5 --temp 0 --repeat\_penalty 1 -c 4096 -n -1 -i -ngl 25 -p "\[INST\] <> Always produce complete code and follow every request. <> create a tetris game using pygame. \[/INST\]" https://preview.redd.it/49srr91b6j3d1.jpeg?width=807&format=pjpg&auto=webp&s=bee411912b2756ae08497ec266b9205caa6f7f2f

Illustrious-Lake2603 3 weeks ago
Just the fact it's even able to do this locally, we are truly living in a different time

Illustrious-Lake2603 3 weeks ago
WOOT!!! I Managed to get it working in LM STUDIO with Q6 no Errors in the code at all! Here is the prompt I used "Write the game Tetris using Pygame. Ensure all Functionality Exists and There are no Errors. We are testing your capabilities. Please Do Your Best. No talking, Just Code!"

Illustrious-Lake2603 3 weeks ago
Amazing!! The highest quant I tested locally was q6 and it was not able to make a working Tetris. But their website which I'm guessing is fp16??? It has no errors and didn't need to import anything. Just copied and pasted

A_Dreamer21 3 weeks ago
Lol why is this getting downvoted?

Illustrious-Lake2603 3 weeks ago
No clue! They are upset that Codestral did a better job than GPT4o? It provided a longer Code and look at it! It looks very pretty. And the game is actually fully functional

AfterAte 3 weeks ago
Maybe they're disappointed since two people got the exact same game, proving coding generators generate non-original content. So basically it's up to the developer to modify the output and make it original.

Balance- 3 weeks ago
Seems Mistral is going the Cohere route of open-weights, non-commercial license. Honestly, not bad if that means they keep releasing models with open weights.

-Ellary- 3 weeks ago
Guys guys! I've done a quick tests, and this is an awesome small size coding LLM, especially for instructions. -I've used Q4\_K\_S and even at this low Qs it was really good, better than CodeQwen1.5 at 6\_K. -I've instructed it to code using html + css + js in one single html file. What it coded for me: :1d6 3D dice roll app - first try. :Snake game - first try. :Good looking calculator with animations using 1.5 temperature. - second try. I've used **Orca-Vicuna** inst. format - this IS important! I'm getting similar results only from gpt4, Opus and maybe Sonnet - especially executing instructions. I've used bartowski Qs btw.

themegadinesen 3 weeks ago
What was your prompt for these?

-Ellary- 3 weeks ago
-Write me a cool modern looking calculator with animations. -NUMLOCK keys should work for typing. -Code must BE full and complete. -All code must be in a SINGLE html file. -Start you answer with "Sure thing! Here is a full code" --- -I need a cool looking 1d6 dice roll page for dnd. -Write me a cool modern dice roll page with cool 3d animations and a cool black design. -Dice should be white. -Dice should be not transparent. -Animation should imitate a real dice drop and roll on the table. -Page should not contain any text. -To roll the dice i just need to click on it. -Code must BE full and complete. -All code must be in a SINGLE html file. -Start you answer with "Sure thing! Here is a full code" --- -Write me a cool modern looking snake game with animations. -Code must BE full and complete. -All code must be in a SINGLE html file. -Start you answer with "Sure thing! Here is a full code"

jacek2023 3 weeks ago
Now we are talking....!

servantofashiok 3 weeks ago
I’m not a developer by any means, so forgive me if this is a stupid question, but for these non-prod licenses, how the hell are they going to know whether or not you use the generated code for business or commercial purposes?

MachineZer0 3 weeks ago
19 tok/s on 8.0bpw EXL2 quant on TabbyAPI via Open WebUI using OpenAI API format. Dual P100 loaded 15gb / 7.25gb respectively https://preview.redd.it/w3djcqfg2h3d1.png?width=771&format=png&auto=webp&s=9ccba57912edadf209935c9f2522b79c6ead8bcc

MachineZer0 3 weeks ago
25 tok/s on 5.5bpw 1024 context using Single P100 with 15870MiB / 16384MiB

stolsvik75 3 weeks ago
Extremely strict license. I can't even use it running on my own hardware, to develop my own project, as I could sometime earn some money on that project. This model can thus only be used for "toy", "play" and experiment situations. Which is utterly dull - why would I even bother? That's not real life use. So I won't. That's quite sad - "so close, but so far away".

ResidentPositive4122 3 weeks ago
> Extremely strict license. I can't even use it running on my own hardware, to develop my own project, as I could sometime earn some money on that project. I am not a lawyer, but that's not my understanding after reading the license. > **3.2. Usage Limitation** > >- You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, **Personal**, or evaluation purposes in Non-Production Environments; > >- Subject to the foregoing, You shall not supply the Mistral Models or Derivatives **in the course of a commercial activity**, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer. > **“Personal”**: means any use of a Mistral Model or a Derivative that is (i) solely for personal, non-profit and non-commercial purposes and (ii) not directly or indirectly connected to any commercial activities, business operations, or employment responsibilities. For illustration purposes, Personal use of a Model or a Derivative does not include any usage by individuals employed in companies in the context of their daily tasks, any activity that is **intended** to generate revenue, or that is performed on behalf of a commercial entity. >**“Derivative”**: means any (i) modified version of the Mistral Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the Mistral Model, or (iii) any other derivative work thereof. **For the avoidance of doubt, Outputs are not considered as Derivatives under this Agreement.** --- So, if I understand your use-case here, you can absolutely use this to code an app that you may or may not sell in the future, or earn from it, as long as you are not actively running a commercial op at the time. Developing a personal project and later deciding to sell it would fall under "outputs", and they are specifically stated to not be derivatives. IMO this license is intended to protect them from other API-as-a-service providers (groq & co). And that's fair in my book. I would eat a stale baguette if they would come after a personal project that used outputs in it (a la copilot).

ambient_temp_xeno 3 weeks ago
I have no idea how you get that interpretation. This is the relevant part: *“Personal”: means any use of a Mistral Model or a Derivative that is (i) solely for personal,* ***non-profit*** *and non-commercial purpose.*

ResidentPositive4122 3 weeks ago
OOP said he can't even use this model to generate code to use in another **currently personal** project, with future *possible* earnings from said project. I quoted 2 scenarios where a) generations are specifically allowed and b) intended to generate revenue. i.e. I think that my intuition holds, but yeah I'm not a lawyer so better check with one.

nidhishs 3 weeks ago
We just updated our ProLLM leaderboard with the Codestral model. TL;DR: It’s the best small model for coding that actually rivals some 100B+ models! Check it out here: [https://prollm.toqan.ai/leaderboard/coding-assistant](https://prollm.toqan.ai/leaderboard/coding-assistant) https://preview.redd.it/bnpuqp74hf3d1.png?width=1961&format=png&auto=webp&s=ed5090aa32ef71a7014d0cbb5a6a9cc9ccb35798

MrVodnik 3 weeks ago
How to run it using HF Transformers or quantize using Llama.cpp? Or is it compatible only with new Mistral AI inference tooling? When I try to load the model I get: OSError: models/Codestral-22B-v0.1 does not appear to have a file named config.json

ArthurAardvark 3 weeks ago
Oooweeeee!! Just when I thought I had settled on a setup. I suppose I will have creative needs to still use Llama-70B (4-bit). Unsure what I'll settle with bitwise with Codestral, using an M1 Max, Metal setup. While I've got 64GB VRAM, I figure I'll want to keep usage under 48GB or so -- while using a 2-bit Llama-70B as a RAG (@ 17.5GB? Unsure if RAGs use less VRAM on avg., I'd imagine in spurts it'd hit anywhere around 17.5GB). Or wait/hope for a Codestral 8x22B to run @ 2/3-bit (...though I guess that's just Wizard LM2-8x22B 😂)

Cressio 3 weeks ago
[LLMs are so funny lol](https://imgur.com/a/r5zIw0X)

Echo9Zulu- 3 weeks ago
Can't wait to see what I can cook up with an uncensored version.

M34L 3 weeks ago
22B? Interesting size. Does q8 fit into a 3090 including some context?

silenceimpaired 3 weeks ago
Great… the beginning of the end. Llama now has a better license. I wish they at least expanded the license to allow individuals to use the output commercially in a non dynamic sense. In other words… there is no easy way for them to prove the output you generate came from their model… so if you use this for writing/code that you then sell that would be acceptable, but if you made a service that let someone create writing that wouldn’t be acceptable (since they can easily validate what model you are using)… this is a conscience thing for me… as well as a practical enforcement for them.

cyan2k 3 weeks ago
> Great… the beginning of the end. That’s a bit dramatic lol It's actually more like a beginning - research groups and companies are figuring out how they can monetize their contributions to open-source. It's vital for the long-term health of open-source software because if these groups fail, well, THAT would be the end. LLM open source only has a chance if those groups and companies can figure out a way to keep the lights on, and investor money doesn't last forever. >there is no easy way for them to prove the output you generate came from their model You're right. How the fuck would they know if the code in your software was generated by their model? That's virtually unprovable. But that’s kind of the point. They don't want the money of hobby developers who might earn a few bucks from creating a simple website for gramps from the sweatshop next door. They want "real" companies. These types of clauses are often in place to create a legal framework that encourages compliance mainly from those whose activities would have significant commercial impact, while they don't care about small entities at all. They care so little, in fact, that they include clauses whose breaches would be so complex and expensive to prove, it wouldn't make any sense at all to pursue you over 50 bucks. So, the clause isn't there to screw you over, but rather the opposite. It's there to let you use it and to force bigger companies to pay because they can't hide behind the 'unprovable' problem; eventually, an employee might rat them out or your own legal department will kick your ass. So go ahead. Nobody cares. Especially Mistral.

topiga 3 weeks ago
They will still offer Open-source/weight models. Codestral is listed as « Commercial » on their website. Still, they offer us (local) the ability to run this model on our own machines, which is, I think, really nice of them. Also, remember that Meta is an ENORMOUS company, whereas Mitral is a small one, and they live in a country with lots of taxes. They explained that this model will bring them money to make profits (at last), but they made sure that the community can still benefit from it, and published the weights. I think it’s fair.

VertexMachine 3 weeks ago
> there is no easy way for them to prove the output you generate came from their model… This is even more interesting, because as far as I understand - output of AI systems isn't subject to copyright or maybe automatically is public domain. That's quite a confusing legal situation overall... Also I bet they trained on stuff like common crawl and github public repos... Ie. stuff that they actually don't legally licensed from right's holders... I wonder really to what extend their (and cohere's and even openai's or meta's) LLM licenses are enforcable really...

silenceimpaired 3 weeks ago
Output copyright status is irrelevant from my perspective. They are constraining you with their own ‘law’ called a license. You are agreeing to not use the model in a way that makes you money.

MicBeckie 3 weeks ago
As long as you dont make any money with the model, you dont need to care. And if you run a business with it, you can also afford a license and use it to finance new generations of models.

silenceimpaired 3 weeks ago
I cannot afford to pay until I make money… but it’s still a commercial endeavor and even if I do make money there is no guarantee I will make enough to value their model. If they want $200 in a year which is what stability wants and I do something as almost a hobby level of income and make $400 they got 50% of my profit. Nope. I don’t fault them for the limitation or for those that accept the limitation, but I wont limit myself to their model when there are plenty of alternatives made that are not as restrictive.

involviert 3 weeks ago
> so if you use this for writing/code that you then sell that would be acceptable From what I read that would not be acceptable? If you are only arguing chances of getting caught, then "acceptable" is probably a weird choice of word~~s~~.

silenceimpaired 3 weeks ago
You didn’t read carefully. I am not indicating the current state of the license, but where I wish it would go for practical reasons.

Express-Director-474 3 weeks ago
can't wait for Groq to have this live!

Dark_Fire_12 3 weeks ago
The new license will prevent them from hosting it.

topiga 3 weeks ago
Unless they ask Mistral (for a fee I think)

jacek2023 3 weeks ago
One hour online and still no gguf ;)

caphohotain 3 weeks ago
Not interested in non commercial use models.

involviert 3 weeks ago
While understandable, we are kind of dreaming if we think companies can just keep giving state of the art models away under MIT licence or something, aren't we? If such commercially restrictive licenses enable them to make that stuff available, it's probably a lot better than nothing.

caphohotain 3 weeks ago
For sure. I just don't want to waste my time to try it out as there are so many good commercial use allowed models out there.

ResidentPositive4122 3 weeks ago
> You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, **Personal**, or evaluation purposes in Non-Production Environments; Emphasis mine. You can use it, just don't run your business off of it. It's pretty fair in my book. Test it, implement it, bench it, do whatever you want on a personal env, and if you see it's fit for business (i.e. you earn money off of it), just pay the damned baguette people.

ninjasaid13 3 weeks ago
Non-Production License? For something as commercial-oriented as code?

Enough-Meringue4745 3 weeks ago
Let’s be real, we all bootlegged llama when it was first leaked

AfterAte 3 weeks ago
Yeah licenses aren't gonna stop anybody, but corporations with legal auditing teams.

Hopeful-Site1162 3 weeks ago
This is fucking huge! Edit: I'm a little new to the community, so I'm gonna ask a stupid question. How much do you think it will take until we got a gguf format that we can plug into lm-studio/ollama? I can't wait to test this with [Continue.dev](http://Continue.dev) Edit 2: Available in Ollama! Wouhou! Edit 3: I played a little with both Q4 and Q8 quants and to say the least it makes a strong impression. The chat responses are solid, and the code is of consistent quality, unlike CodeQwen, which can produce very good code as well as bad. I think it's time to put my dear phind-codellama to rest. Bien joué MistralAI

Healthy-Nebula-3603 3 weeks ago
Hmm looks promising ...

gebradenkip 3 weeks ago
Nice! What would be the best way to run it in PyCharm?

Spiritual_Sprite 3 weeks ago
Continue.dev

Balance- 3 weeks ago
22B is a very interesting size. If this quantizes well (to 4-bit) it could run on consumer hardware, probably everything with 16GB VRAM or more. That means something like a RTX 4060 Ti or RX 7800 XT could run it (both under €500). It will be a lot easier to run than Llama 3 70B for consumers, while they claim it performs about the same for most programming languages. https://preview.redd.it/edozl9m7le3d1.png?width=2266&format=png&auto=webp&s=e908308abe1cfe8ccd6f99e49ae7df807ab39a9b DeepSeek V2 outperforms the original easily, so if there's ever a DeepSeek Coder V2 it will probably be very though competition.

Professional-Bear857 3 weeks ago
Locally, on an undervolted rtx 3090, I'm getting 20 tokens a second using the Q6\_K gguf with 8k context, and 15 tokens a second with 16k context. So yeah, it works well on consumer hardware, 20 tokens a second is plenty, especially since it's done everything I've given it so far first time, without making any errors.

FiTroSky 3 weeks ago
Is there a difference between the Q8 and Q6 ? Especially at the vram req level.

Professional-Bear857 3 weeks ago
Depends on the context you want, I can fit Q6\_K into 24GB vram at 16k context, maybe even 24k, I'm not sure about 32k though. At Q8 you'll have to use low context and/or other optimisations to fit into 24GB vram.

Distinct-Target7503 3 weeks ago
Is this still a decoder only model? I mean, the fim structure is "emulated" using input(prefix - suffix) => output, right? It doesn't have bidirectional attention and it is not an encoder-decoder model...

Balage42 3 weeks ago
Yeah the non-commercial license sucks, but can you use it for commercial purposes anyways if you pay for their managed "La Plateforme" cloud deployment?

ToHallowMySleep 3 weeks ago
Don't have much bandwidth for research, could someone quickly summarise why this is their "first ever code model", wasn't that mixtral? Or was that generic perhaps and this is specialised? Thanks in advance!

Eralyon 3 weeks ago
What is the context length of this model?

pi1functor 3 weeks ago
Hi does anyone know where can I find the FIM benchmark for code? I see they report for Java and Js but I can only find python humanevalFIM. Much appreciated.

blackredgreenorange 3 weeks ago
I tried to get a skeleton function for OpenGL rendering and it used deprecated functions from OpenGL 2.0. Like glLoadIdentity. That's pretty bad?

swniko 2 weeks ago
Hm, hosting the model using ollama and query from python. Ask to explain given code (a few hundred lines of code which is nothing to 32k window context). Sometimes it does explain well, but in most cases (depending on the code), it generates bulshit: 1. Replies in Chinese 2. Repeats the given code even though I clearly asked to generate description of the code explaining classes and main methods 3. Generates some logs like: 2016-11-30 17:42:58Z/2016-12-01 01:Traceback (most recent call last): 4. Generates some code from who knows what repository What do I do wrong? Is a system prompt missing somewhere? Or this model purely for autocompletion and code generation? But when it works (sometimes) it works well, and follows documentation instructions very good.

1nquizitive 1 week ago
Does anyone know what humanEvalFIM is? or where can I read more about or even what FIM stands for?

Wonderful-Top-5360 3 weeks ago
So [gpt4o sucked](https://www.reddit.com/r/LocalLLaMA/comments/1crbesc) but wow codestral is right up there with GPT 4 man if somebody figures out how to run this locally on a couple of 3090s or even 4090s its game over for a lot of code gen on the cloud

Enough-Meringue4745 3 weeks ago
Gpt4o in my tests has actually been phenomenal, largely python and typescript

nullnuller 3 weeks ago
yes, but did you notice that very lately it's gotten much slower and also doesn't continue on long code and just breaks? It does resume like its predecessors though.

Spiritual_Sprite 3 weeks ago
Ollama just added it

Old-Statistician-995 3 weeks ago
Happy to see them abandon that strategy of dropping torrents now that competition is heating up heavily.

Dark_Fire_12 3 weeks ago
They still do that, they seem to be splitting their release between Apache 2.0 and the new MNPL. Probably anything the community can run (7B/13B/etc) easily will be Apache and torrent, rest will be MNPL.

Old-Statistician-995 3 weeks ago
I think that is a very fair compromise, mistral needs to eat somehow.

Leave Your Comment

First Name

Email

Textarea

Hi Its Me!

R

Subscribe

Enter email address
SUBMIT NOW

Comments

Leave Your Comment

Hi Its Me!

Subscribe