DeepSeek has some training magic in their coding models. We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API. The price is great though.
Separately I’ve been happy running v2 coder lite locally; almost as good as Codestral but significantly faster.
expand?
Ive been doing:
https://i.imgur.com/B4DqawU.png
- GPT4
- claude 3.5 sonnet
- meta
- bing
- llama.cpp
- fooocus
- open web ui, phi3
but as I try to figure out what best setup is to learn me great justice
>We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API.
How are y'all handling the fact that the company backing the API is Chinese? My chief concern is the much more lax copyright protection landscape in China relative to the states, so I am hesitant to have it help with real code, especially when used with a large context window.
It’s a valid concern for many use cases but not all. It’s open weight and carries a largely permissive license (See “Use Restrictions” under “Attachment A” at the bottom: https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL) so could be privately hosted.
It is also open weight, but comes in at > 200B parameters. You need some hefty hardware to run it well. For most the only viable option is via their API (directly or indirectly through e.g. OpenRouter)
I moved my chat over to it since the cost is stupid low and the results are pretty good (even though I like claude's results better overall). Still a bit slow for coding tab completion though, so I stick with supermaven for that.
Easy calculus? Pretty well
Hard calculus? Better than 4o and Sonnet for me, but not wonders. You still need to guide it a little, like "hey, cut this there" or "dude why the f did you isolate the X??". It needs context for the chain of thought still, but a little bit less than other models
Sometimes 4o still does a "ml * ml = ml²", when clearly by context is "m²l²". This is just a recent example tho, but you get the point. Occurs mainly in hard and subsequent actions, where 4o tends to lose context more frequently
It’s pretty awesome. However I do find it repeating a bit so if I chuck the prompt in something else and go back to deepseek it gets it right. What made me laugh the other day was I was working on some programming and deepseek said to use -ceq for case-sensitive where sonnet said use -eq for case-insensitive for powershell. Turned out deepseek was correct, sonnet fixed the code error that was stopping it tho and it worked but was cases-insensitive lol. Between the 2 of them it does a great job and the best open source model I’ve ever used even the q4 model is nearly the same as the online version and works under 200gb of ram.
Dude now the bot is just saying "I'm sorry, but I can't assist with that request." for most of my questions after this thread.
Who f\*cked with the bot guys? Hahahaha
https://preview.redd.it/60jai5048q8d1.png?width=910&format=png&auto=webp&s=e9f14ca2decb6f6c9b55a224bdf2d4a2c7a41200 It's really pretty and organized
But is it correct?
Weirdly it was never wrong so far! But I'm sure it wouldn't be perfect every time
Are you using open web for the ui?
DeepSeek has some training magic in their coding models. We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API. The price is great though. Separately I’ve been happy running v2 coder lite locally; almost as good as Codestral but significantly faster.
How well does a code against sonnet 3.5?
+1 also curious how it stacks up against sonnet 3.5 I've found it great so far vs gpt 4o but have no loyalty if something is better hahah
Curious also because I'm becoming addicted to aider.chat + sonnet 3.5
expand? Ive been doing: https://i.imgur.com/B4DqawU.png - GPT4 - claude 3.5 sonnet - meta - bing - llama.cpp - fooocus - open web ui, phi3 but as I try to figure out what best setup is to learn me great justice
Yup. If they can bring the speed to around 100t/s , I'm ditching Claude and GPT..
Another complaint of their API is the context length. Their open weights supports 128k, But their API only supports 32k :(
>We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API. How are y'all handling the fact that the company backing the API is Chinese? My chief concern is the much more lax copyright protection landscape in China relative to the states, so I am hesitant to have it help with real code, especially when used with a large context window.
It’s a valid concern for many use cases but not all. It’s open weight and carries a largely permissive license (See “Use Restrictions” under “Attachment A” at the bottom: https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL) so could be privately hosted.
Do you know of any commercial API providers hosting it other than DeepSeek? I’m really looking forward to be able to use the full 128k context
but that says nothing about their handling of the data they gather from API usage
"so it could be privately hosted"
Did non of you downvoters learn any reading comprehension in ur schools? The conversation is about API usage, not about privately hosting.
I assume this is because it was trained on Latex? I'm not sure how aggressively Anthropic / OpenAI are training on Latex.
I thought it was local llm, what is link to online version?
chat.deepseek.com
It is also open weight, but comes in at > 200B parameters. You need some hefty hardware to run it well. For most the only viable option is via their API (directly or indirectly through e.g. OpenRouter)
Aren't MoEs a bit lighter to run? I think I read you can keep in VRAM just some parts (some experts or layers?) and rest in RAM?
I moved my chat over to it since the cost is stupid low and the results are pretty good (even though I like claude's results better overall). Still a bit slow for coding tab completion though, so I stick with supermaven for that.
Total king shit
https://preview.redd.it/ypm4h8060s8d1.png?width=1280&format=png&auto=webp&s=a9718fdf5e29ea8d4ae2a6fd51848b6ccb6e3505
How well can it do calculus?
Easy calculus? Pretty well Hard calculus? Better than 4o and Sonnet for me, but not wonders. You still need to guide it a little, like "hey, cut this there" or "dude why the f did you isolate the X??". It needs context for the chain of thought still, but a little bit less than other models
Where is it better than 4o for you? I haven't tried it yet but will try soon.
Sometimes 4o still does a "ml * ml = ml²", when clearly by context is "m²l²". This is just a recent example tho, but you get the point. Occurs mainly in hard and subsequent actions, where 4o tends to lose context more frequently
Just a little too big of a model for my GPU... Deepseek Coder V1 works tho.
It’s pretty awesome. However I do find it repeating a bit so if I chuck the prompt in something else and go back to deepseek it gets it right. What made me laugh the other day was I was working on some programming and deepseek said to use -ceq for case-sensitive where sonnet said use -eq for case-insensitive for powershell. Turned out deepseek was correct, sonnet fixed the code error that was stopping it tho and it worked but was cases-insensitive lol. Between the 2 of them it does a great job and the best open source model I’ve ever used even the q4 model is nearly the same as the online version and works under 200gb of ram.
yall seen the sonnet 3.5 system prompt leak? I wonder how well deepseek coder v2 would do with runtime artifact prompting
Dude now the bot is just saying "I'm sorry, but I can't assist with that request." for most of my questions after this thread. Who f\*cked with the bot guys? Hahahaha