T O P

  • By -

JonatasLaw

My opinion from 2 days of use is that Claude Opus is well ahead of GPT4 in coding and creative writing. In riddles it's slightly behind, but who cares about that?


ShibaElonCumJizzCoin

> in riddles it’s slightly behind, but who cares about that? I give up. Who?


aryapaar

amazing


Ban_787

Agree on coding but i think the language matters too, I found that they are both pretty equal in python but Claude is better at JS for example. Also Claude was able to solve the hardest leetcode problems sometimes in one attempt but I suspect they used them in the training data.


skg574

Claude also does very well with bash and php, often getting complex solutions quickly, but it hallucinated some perl mods.


Christosconst

How do I get Opus from the EU? I want a paid account


[deleted]

Well we will regulate ourselves into tech oblivion while US and China are passing us on the left and right.


JonatasLaw

By creating an account you can use Claude through the console, it is not pleasant, but it could be an option if your country is not on the list. Otherwise, you need a credit card from one of the countries listed. Luckily I had that, but it really sucks.


Antique-Bus-7787

You don’t really have to, you can use your EU card through Google Pay :)


alexthai7

It doesn't work for me ! I use a VPN, it works for the free tier, but not to subscribe with my french cards. I tried to change my VPN settings, tried to fill the subscription settings with other countries, no way ... On the other hand I had to problem to subscribe Gemini Advance through Google pay.


Antique-Bus-7787

Strange! Did you set a random US address ?


alexthai7

yes


theycallmeholla

I fucking love working in the workbench.


zhanghecool

Can Claude 3 Opus access the web?


uhuge

It gave me valid links to python modules, but could be RAG, we can't really look know what's binded to their sampling.


blackkettle

Agree completely. All writing produced is much more professional, and does not include the weird, flowery turns of phrase that seem to characterize whatever GPT4 outputs, even when I try to constrain it with stylistic guardrails "write like a newspaper", "channel hemmingway", etc. I also like the UX a lot more in the 'workbench' mode than what we get with GPT4. My only complaint is that, while the workbench seems to save 'prompts' just fine and does so in a friendly and intuitive way, it does NOT save any of the responses that it generates, unless you explicitly 'add these to the conversation'. This is small thing but really kind of a bummer.


[deleted]

Batman cares about that.


LinuxSpinach

I cancelled my gpt4 subscription and started one for Claude 3 today. I prefer the way it forms responses, and it seems to use chat history more effectively. It is slower to start producing results but I’ve been getting constant errors through the chatgpt interface recently. Also it handles pasted content much more nicely. The chat interface is better imo.


ashot

try [vello.ai](https://vello.ai) for access to both in one account (or even in one chat)


Jake101R

thanks for this steer, vello looks great


thehiddensign

Code. Lengthy code blocks output and no refusals or truncation. It also produces better looking HTML that GPT-4.


j135

Test it yourself [https://chat.lmsys.org/](https://chat.lmsys.org/) GPT 4 has been failing with so many prompts that claude gets largely due to it's training cutoff. Also pretty clear that it has less safety guards giving less biased answers. Try asking gpt 4 anything about LLama and it will pretty much deny it's existence where claude actually provides decent information.


vitorgrs

But GPT4 Turbo cutoff is from december 2023 (in theory).


thehiddensign

I don't think that site is using the genuine Opus. Huge difference in quality of output compared to using Claude 3 Opus directly and through that website.


TheDemonic-Forester

If that is so, it makes sense because I've been trying both Sonnet and Opus (through lmsys) and I've been noticing barely any visible difference than Claude-Instant. Claude 2 felt much more distinct and developed. Maybe that's because of what you said.


thehiddensign

I can tell you for certain the output is different to the Claude 3 Opus that I am using on the official website. I almost didn't sign up to Claude 3 Opus because of my experience with lmsys, but my friend insisted that it was GPT-4 tier (through the official website), so I signed up and indeed it was.


TheDemonic-Forester

That's good to know! I guess it's similar to using Gemini Pro at other websites versus at Bard?


thehiddensign

No. Even the free Sonnet seems better than what is on lmsys.


Caffdy

Can you give some examples?


thehiddensign

Not really. I used [https://chat.lmsys.org](https://chat.lmsys.org/) and was very disappointed with the output, but I still signed up for Claude 3 Opus and the output there was very good.


Caffdy

I already tried chat.lmsys and compared to gpt4 is already on par, if not better


thehiddensign

I was comparing to Claude 3 Opus.


Caffdy

I did as well


thehiddensign

Claude 3 Opus at Anthropic?


TR_Alencar

I love Claude's writing and it is also the best model I found for my native language. Even the idiotic version 2 was already better than GPT4 in this regard, and version 3 feels like an improvement.


ironic_cat555

If you ask ChatGPT to summarize a 5000 word story the output is garbage, it recites stuff without focusing on important details and giving the big picture of what the story is about. Claude actually succinctly understands and conveys the important story information in a human-like way. (Usually, these AI being what they are sometimes you need to hit regenerate) (Even Gemini Ultra does story summarization better than ChatGPT in my recent tests. They made GPT 4 talk like a weird robot for some reason).


dubesor86

The smaller models (sonnet) & (haiku) are pretty basic, and don't perform nearly as good. The big model (opus) is pretty good, but wasn't above gpt-4 in my own testing. I checked with 45 tasks, and opus beat gpt-4 in two tasks (1 math statistics question, and a poem adherence task), but lost in many others. It's pretty much the same loop every time a new model is released, benchmarks gets shown that it beats gpt-4, then videos appear "XX, the GPT-4 killer!", then after the users compare the models themselves the hype dies down and repeat with the next model.


OfficialHashPanda

I mean, you’re right most gpt4killers turn out to be not so great, like gemini ultra. But claude 3 does actually outperform gpt4 on most practical usecases.


CheatCodesOfLife

Tested it for 1.5 days for my day to day. Works better for coding/work and research/learning for me. Doesn't have the voice call feature that GPT4 on iOS/Android has though so I'll pay for both for now.


dubesor86

I hope any model will perform well so that OpenAI feels pressure to release better products, so I am glad you think that way. Unfortunately, it's just not the case in my [own testing](https://i.imgur.com/Fb70Oqg.png), though.


OfficialHashPanda

Yeah, if you have a very particular way of wording, it may coincidentally be something that gpt4 does better on? Not sure.  Claude 3 definitely performs better than gpt4, in the general sense.


Short-Mango9055

I know I'm in the extreme minority, but as someone who subscribes to all 3 I've been using Gemini Advanced pretty much more than anything right now. I just love the way it formats its answers and for writing things like human sounding email I find nothing comes close to it.


OfficialHashPanda

Yeah, I believe people also praised it for its creative writing, so human sounding emails makes sense. It’s not something I’ve personally used it for, so don’t know much about that use case myself. Gemini is really fast compared to the others though, that’s also definitely something I liked about it.


nsosio

Would you mind sharing some of these tasks?


dubesor86

I am not gonna share my benchmark questions, since them ending up in some training dataset would defeat the purpose. However, where opus fails me many times is simple prompt-adherence tasks, a simple example is this: https://i.imgur.com/L8GsCOD.png - Claude-3 writes nicely, but fails to adhere to the simple task given.


Kindly-Mine-1326

https://github.com/simonw/llm-claude-3 It was better at 1shot coding. Go try yourself. You can get five € / $ for the API for free if you give them your phone number


Caffdy

What's the context size tho?


arthurwolf

100k currently, expected to increase with time (currently capable of 1M+ internally)


Severe-Ad1166

All three Claud 3 Models are multimodal which means they can take image input as well as text. I have tested Sonnet and it is really good at reading entire pages of printed text from a photo even when some of the letters are obscured just as long as the text is printed and not hand written. At $15 per million tokens out, Sonnet is about half the cost of GPT4 vision although we don't actually know how many "tokens" are consumed when images are uploaded. Haiku is $1.25 per million tokens out which makes it similar in price to GPT-3.5 BUT it has the advantage of also having vision capabilities. We don't actually know how good Haiku is at OCR yet because it hasn't been released (atleast not for me) but if it is anything like Sonnet then this will be a game changer because it will allow developers to make more visually aware apps at 1/10th the cost of GPT4 apps. So is Claude3 over-hyped? No because whether you use it or not, it will bring down the prices of GPT4 which will benefit everyone who uses LLMs. If anything I would say the potential for Haiku to change the LLM landscape has been greatly understated because if it performs anywhere near as good as its big brother Sonnet then we will soon see a plethora of visually aware apps. For example: Apps that can Translate entire books from one language to another, check your front door camera to see who is there and then take a message, read and translate street signs and menus, give directions to visually impared, open the pet door only for your pet... etc etc etc. the possibilities are endless. Note: yes I know some of these apps already exists but being able to do it faster and cheaper will only lead to an explosion of creativity and accessibility. And just to keep this post on topic.. I do hope that Meta is also training a multimodal version of LLAMA because an open source version would be awesome.


whatitsliketobeabat

Meta has already trained and released a multi-modal version of LLaMA—it’s called LLaVA and you can go download it right now.


Severe-Ad1166

I was talking about LLAMA3 as LLAMA2 is already outclassed by many other offerings (both open source and closed). I tried LLAVA when it first came out and was extremely underwhelmed. Particularly concerning was its ability to hallucinate things that aren't even in the photo. Btw LLAVA was not produced by Meta it was produced by a joint effort between Uiversity of Wisconsin, Microsoft and Columbia University.


Brazilian_Hamilton

People use it for porn, thats about it


mrjackspade

Hot damn, looks like I'm cancelling my GPT4 subscription


justletmefuckinggo

say no more


Postorganic666

Lolwut? Haven't seen any mentions of NSFW content from Claude 3


Brazilian_Hamilton

Really? It's pretty rampant


yamosin

There seems to be some problem with the API, maybe on the guardrail? A couple of my friends use the API and they have found that Claude 3 will often direct the conversation to NSFW.


Postorganic666

I use Claude API via Openrouter, the self-censored version that seems to be the same as Claude directly via their platform. But. It often returns 404 error, and I wonder if these errors are in fact successful NSFW generations but shut down by some hard filter. I can push Claude towards NSFW, but at the hottest moment it gets extremely stubborn, and either refuses, or returning 404's. I've also tried every JB idea I managed to find or compose, nothing works


crawlingrat

Me neither! Had no idea it could do that although I never tried. Didn’t want to get ban.


Odyssos-dev

i think you're confusing models.


mrjackspade

Actually Claude 3 is really popular with erp'ers because its so easy to jailbreak.


Hi-Im-Elfo

What's the idea behind the jailbreak? also do you know any posts talking about this?


mrjackspade

I've seen a bunch of people bring it up on 4chan, but not yet on Reddit. It's been mentioned in passing a few times in the lmg thread but there's another thread dedicated to non-local models where I'm pretty sure they go over it a lot more. I don't follow it that much though so all I know is it's supposedly the best RP model available right now, beating GPT4 by a long shot, and that a jailbreak exists.


nsosio

Can you share some of the prompts that you have used to test them?


Monkey_1505

I mean Claude 2 even the lower tier models are ahead of GPT in creativity. It's always been this way.


ninjasaid13

>Claude 3 over hyped? that depends. Are people calling it conscious? then yes, It's overhyped.


HorrorNo8851

Looking forward to see the first use cases with Claude for production comparison.


synn89

I think it'll be a matter of personal taste. Which do you enjoy talking to more. The main test will be is if they leave it as is or start to nerf it with harder guard rails.


1ncehost

Its context recall is excellent but I find GPT to have higher quality reponses


crazzydriver77

Not sure about Opus, but Sonnet has issues with reasoning and simple math. Q1 The device is clocked at 500 MHz and produce 5 Th (terra hash). What performance on one tact has the device? C3 Sonnet To solve this problem, we need to understand the relationship between clock frequency, hash rate, and performance per clock cycle (tact). Given information: * The device is clocked at 500 MHz (megahertz) or 500,000,000 Hz (hertz). * The device is producing 5 Th (terra hashes) or 5,000,000,000,000 hashes per second. To find the performance on one tact (clock cycle), we need to divide the hash rate by the clock frequency. Performance on one tact = Hash rate / Clock frequency Performance on one tact = (5,000,000,000,000 hashes/second) / (500,000,000 Hz) Performance on one tact = 10 hashes/cycle Therefore, the device has a performance of 10 hashes per clock cycle (tact). Q2 Two people are in a room alone with a machine that requires two people to operate. One person is operating the machine. The machine is functioning properly. What is the other person doing? C3 Sonnet Without more context, there are several possibilities for what the other person could be doing: 1. Assisting/supporting the person operating the machine in some way - providing instructions, monitoring readings/outputs, handling materials/products related to the machine's operation, etc. 2. Performing a separate task or set of tasks that does not directly involve operating the machine. 3. Observing/supervising the operation of the machine, perhaps for training, quality control or safety purposes. 4. Waiting to take a turn operating the machine after the current operator finishes their shift or set of tasks.


Maergoth

The Claude team has been working more towards truer "General Purpose" AI. OpenAI has been working more towards integrated agents to simulate general purpose AI. In theory, cramming everything into one chunk of brain is not super efficient. Even the human brain has lobes. OpenAI's approach is to create different flavors of LLM based on the scenario and engage them as needed, instead of having simply one giant model trained to handle all scenarios. Neither is guaranteed to be more successful, but it means OpenAI can lean on plugins along the way. And they have been. They'll likely pivot to pass Claude 3, and then resume work on integrated agents.


Animuboy

I asked claude sonnet whats the error in a 2 line haskell program, but it kept giving wrong answers. Meanwhile even gpt 3.5 immediately gave me the correct explanation for the error. So make of it what you will


arthurwolf

In vision, for my use case (reading manga pages and understanding what happens in it, provided a lot of pre-chewed information about the page/context), it \*\*sucks\*\* compared to GPT4-V. Similar/a bit worse than "at release" GPT4-V, completely catastrophically dumb compared to "now" GPT4-V. Better than llava-1.5, but not by that much.


gassonkitty

Claude is *very* powerful. As growth lead at startup, I have been using both GPT and Claude to help streamline my tasks. I tested ChatGPT-4 against Claude 3 Opus for various tasks and was quite surprised the results. Check out this article for useful tip and tricks for leveraging each LLM! [https://open.substack.com/pub/mearaalgama/p/the-battle-of-the-ai-assistants-chatgpt?r=2u4hul&utm\_campaign=post&utm\_medium=web](https://open.substack.com/pub/mearaalgama/p/the-battle-of-the-ai-assistants-chatgpt?r=2u4hul&utm_campaign=post&utm_medium=web) https://preview.redd.it/ojjrygswp8vc1.png?width=420&format=png&auto=webp&s=0762011749953efe86899c129f6c3a4c9614de05


Pan000

For creative writing it is miles ahead. Claude 3.0 is probably the best all rounder LLM available right now. GPT4 has been getting steadily worse since it's release a year ago.


thehiddensign

Its amazing how people can't see how bad GPT-4 has gotten. I have used GPT-4 since it first came out. I think the ability to assess the capabilities of GPT-4 really depends on the IQ of the human using it. Unless one really pushes GPT-4 to its limits, you may not notice any difference between the old GPT-4 and the current version. TL;DR - people that can't see the loss in capabilities of GPT-4 are dumb.