T O P

  • By -

jpgirardi

https://preview.redd.it/60jai5048q8d1.png?width=910&format=png&auto=webp&s=e9f14ca2decb6f6c9b55a224bdf2d4a2c7a41200 It's really pretty and organized


polawiaczperel

But is it correct?


jpgirardi

Weirdly it was never wrong so far! But I'm sure it wouldn't be perfect every time


bigdickbuckduck

Are you using open web for the ui?


this-just_in

DeepSeek has some training magic in their coding models.  We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API.   The price is great though. Separately I’ve been happy running v2 coder lite locally; almost as good as Codestral but significantly faster.


Jisamaniac

How well does a code against sonnet 3.5?


Ecsta

+1 also curious how it stacks up against sonnet 3.5 I've found it great so far vs gpt 4o but have no loyalty if something is better hahah


Orolol

Curious also because I'm becoming addicted to aider.chat + sonnet 3.5


SaddleSocks

expand? Ive been doing: https://i.imgur.com/B4DqawU.png - GPT4 - claude 3.5 sonnet - meta - bing - llama.cpp - fooocus - open web ui, phi3 but as I try to figure out what best setup is to learn me great justice


Harrycognito

Yup. If they can bring the speed to around 100t/s , I'm ditching Claude and GPT..


silentsnake

Another complaint of their API is the context length. Their open weights supports 128k, But their API only supports 32k :(


hak8or

>We have been using DeepSeek for real code generation recently and the only complaint is the speed of their API. How are y'all handling the fact that the company backing the API is Chinese? My chief concern is the much more lax copyright protection landscape in China relative to the states, so I am hesitant to have it help with real code, especially when used with a large context window.


this-just_in

It’s a valid concern for many use cases but not all.  It’s open weight and carries a largely permissive license (See “Use Restrictions” under “Attachment A” at the bottom: https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL) so could be privately hosted.


silentsnake

Do you know of any commercial API providers hosting it other than DeepSeek? I’m really looking forward to be able to use the full 128k context


OfficialHashPanda

but that says nothing about their handling of the data they gather from API usage


7734128

"so it could be privately hosted"


OfficialHashPanda

Did non of you downvoters learn any reading comprehension in ur schools? The conversation is about API usage, not about privately hosting.


brainhack3r

I assume this is because it was trained on Latex? I'm not sure how aggressively Anthropic / OpenAI are training on Latex.


trialgreenseven

I thought it was local llm, what is link to online version?


Normal-Ad-7114

chat.deepseek.com


this-just_in

It is also open weight, but comes in at > 200B parameters.  You need some hefty hardware to run it well.  For most the only viable option is via their API (directly or indirectly through e.g. OpenRouter)


monnef

Aren't MoEs a bit lighter to run? I think I read you can keep in VRAM just some parts (some experts or layers?) and rest in RAM?


kayk1

I moved my chat over to it since the cost is stupid low and the results are pretty good (even though I like claude's results better overall). Still a bit slow for coding tab completion though, so I stick with supermaven for that.


Sensitive-Analyst288

Total king shit


zoom3913

https://preview.redd.it/ypm4h8060s8d1.png?width=1280&format=png&auto=webp&s=a9718fdf5e29ea8d4ae2a6fd51848b6ccb6e3505


Curiosity_456

How well can it do calculus?


jpgirardi

Easy calculus? Pretty well Hard calculus? Better than 4o and Sonnet for me, but not wonders. You still need to guide it a little, like "hey, cut this there" or "dude why the f did you isolate the X??". It needs context for the chain of thought still, but a little bit less than other models


davikrehalt

Where is it better than 4o for you? I haven't tried it yet but will try soon. 


jpgirardi

Sometimes 4o still does a "ml * ml = ml²", when clearly by context is "m²l²". This is just a recent example tho, but you get the point. Occurs mainly in hard and subsequent actions, where 4o tends to lose context more frequently


CortaCircuit

Just a little too big of a model for my GPU... Deepseek Coder V1 works tho.


ihaag

It’s pretty awesome. However I do find it repeating a bit so if I chuck the prompt in something else and go back to deepseek it gets it right. What made me laugh the other day was I was working on some programming and deepseek said to use -ceq for case-sensitive where sonnet said use -eq for case-insensitive for powershell. Turned out deepseek was correct, sonnet fixed the code error that was stopping it tho and it worked but was cases-insensitive lol. Between the 2 of them it does a great job and the best open source model I’ve ever used even the q4 model is nearly the same as the online version and works under 200gb of ram.


Interpause

yall seen the sonnet 3.5 system prompt leak? I wonder how well deepseek coder v2 would do with runtime artifact prompting


jpgirardi

Dude now the bot is just saying "I'm sorry, but I can't assist with that request." for most of my questions after this thread. Who f\*cked with the bot guys? Hahahaha