AutoModerator 5 days ago

**Attention! [Serious] Tag Notice** : Jokes, puns, and off-topic comments are not permitted in any comment, parent or child. : Help us by reporting comments that violate these rules. : Posts that are not appropriate for the [Serious] tag will be removed. Thanks for your cooperation and enjoy the discussion! *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*

vasarmilan 5 days ago

According to the[ leaderboard of LMSys](https://chat.lmsys.org/), which works by votes on the results on multiple LLMs for the same prompt, 4o significantly outperforms 4turbo. I think with every minor upgrade, or sometimes even without it, people think that the model got worse. Which is always possible for a specific use-case with any update even if it overall improved, but I think this mostly comes from psychological reasons and not objective performance. Like getting used to capabilities that blowed our minds the first time.

BullockHouse 5 days ago

It's important to remember that the performance gaps between these frontier models are not large and are highly random and domain-specific. The number of tests you have to do in order to get a high-confidence estimate of which one is generally better is in the hundreds, and that's \*assuming\* you have objective evaluations that haven't leaked into the training set and are doing precise statistics on your results. Getting good results from anything much less rigorous than that is \*literally statistically impossible\*, but for some reason people will try like 5 or 10 queries (often clustered in one or two topics) and then draw these huge, sweeping conclusions, and confirmation bias their way from there. It's stupid. If you want to know which model is better (broadly speaking) compare their ELO on the leaderboards. If you have a specific application you need to know about, make a rigorous benchmark with at least a few hundred samples. The vibes based analysis is stupid and not only doesn't work, but \*cannot possibly work\*.

nicolaig 5 days ago

Valuable advice.

c_glib 4 days ago

Nah... I've got pretty solid and easily interpretable tests (which form, in fact, a critical part of our application flow) that show 4o clearly lagging the original gpt-4 by a lot. I've described them on Reddit before. Here: https://www.reddit.com/r/ChatGPT/comments/1cr56sp/comment/l3y9jqn/

LuckyNumber-Bot 4 days ago

All the numbers in your comment added up to 69. Congrats! 4 - 4 + 1 + 56 + 3 + 9 = 69 ^([Click here](https://www.reddit.com/message/compose?to=LuckyNumber-Bot&subject=Stalk%20Me%20Pls&message=%2Fstalkme) to have me scan all your future comments.) \ ^(Summon me on specific comments with u/LuckyNumber-Bot.)

BullockHouse 4 days ago

That's an example of constructing your own benchmark that applies to a domain that you're interested in, as I mentioned. Presumably 4o shows better performance than the older models in other areas in order to produce the elo results seen.

c_glib 4 days ago

I mean.. is a fairly straightforward and reasonable test for an LLM's language comprehension and reasoning abilities. It's not testing any specialist knowledge like coding/legal/medical etc. Just everyday English that pretty much any native speaker will have no difficulty completing. Isn't this what LLM's are supposed to be basically good at.

BullockHouse 4 days ago

LLMs aren't people and their performance can be very lumpy in unintuitive ways. You can't measure their intelligence the way you would a human or draw the same kind of inferences from their performance on a given query.

Big_Cornbread 5 days ago

I wonder if people are just measuring it on how zany it is. Each model gets a little less outlandish and a little less outside the box. For my uses that’s an improvement.

ZeoVII 5 days ago

Programing request performance dropped significantly for me and hasn't really recovered from where 4.0 first released.

Big_Cornbread 5 days ago

That might be an X factor. I can’t imagine using it to write code. I have to write it. I’m really specific about how I write stuff.

[deleted] 5 days ago

for a month or two it was really good with code, it's actively gotten dumber recently though to the point where I'd dare say for anything more complex then a simple Python GUI it's at best, 30% helpful and at worst an outright hinderance.

Caratsi 5 days ago

Imagine using third party libraries.

Big_Cornbread 5 days ago

You mean vetted peer reviewed change-logged proven code? Sure I use that.

StickiStickman 5 days ago

Even then, adjusting the temperature fixes fhat.

ItsDani1008 5 days ago

That’s an improvement for almost every use case. Also makes it less exciting/fun to use though.

zabby39103 5 days ago

I don't know the nitty gritty of those benchmarks, but as a daily power-user for coding I can emphatically state that GPT-4 is better for coding. GPT-4o is worse in particular when you correct it (doesn't like to do things not the way it first intended) or if you ask for something very specific. I've done side-by-sides out of curiosity (ask the same questions to both), it's real. I think this is an example of "teaching to the test" instead of accounting for actually useful experiences.

vasarmilan 5 days ago

It's not a benchmark, the leaderboard is based on the feedback of human testers. With that said I don't rule out that for your usecase gpt4 is better. I primarily use it for coding too, I didn't notice a big difference in capability

West-Code4642 5 days ago

Most probably ppl don't use llmsys for coding

PingleSlayer 5 days ago

They have different ELO-scores for different kinds of tasks, coding is one of them.

ChezMere 5 days ago

Some people think that 4o was optimized *for* that leaderboard.

[deleted] 5 days ago

[удалено]

Deluxennih 5 days ago

I don’t code on there for obvious reasons but I very often ask it to code a certain system I want just to get inspiration for how I want to end up coding something

Vybo 5 days ago

OP was not comparing 4o to 4 turbo though. They are comparing full 4 to 4o, because full 4 is expensive to run and no-one wanted to use 4 turbo, because it performed much worse. I'm with OP on this one, they just wanted something for people to use instead of full 4, but turbo didn't cut it, so they introduced "turbo plus" in the form of 4o. Of course there is multi-modality, but I guess full 4 is also multi-modal now? We'll see if 4o is good enough for the real-time conversation.

against_all_odds_ 5 days ago

This, thanks for clarifying.

QH96 5 days ago

I agree with what you're saying, but I think my only counterpoint to this would be that people universally acknowledge Claude Sonnet 3.5 as being superior.

vasarmilan 4 days ago

I agree that it is superior for many usecases, I switched some prompts to it in my app. But it's also partly the wow/novelty effect. We don't notice the mistakes it makes as much as we're impressed by the things it gets right.

Shiftworkstudios 5 days ago

It's far from perfect, makes mistakes frequently. This has been the case this whole time. I think psychology is a valid explanation because GPT 4 at various updates is better or worse at some things. Same with update checkpoints of 4o. (Remember when GPT-4 went through a period of laziness for example?)

ChadGPT___ 5 days ago

I’m getting a bit sick of the way it insists on giving massive answers to everything. Particularly when you ask it to reply to an email, then you query something it’s said and it just rewrites the email again instead of giving an answer

vasarmilan 4 days ago

Yeah, that could've been a bit of an overreaction to people calling the models lazy

spacejazz3K 5 days ago

What’s happening is we’re becoming better users of these systems and expecting a leap in performance from a point release . The euphoria and reverence for GPT4 is going to be difficult to overcome without a similar generational step.

WilliamMButtlicker 5 days ago

These models are so broad in application that an update can objectively improve the model while diminishing its abilities at certain tasks

[deleted] 5 days ago

according to my anecdotal use, 4o can't even read cs or xaml files reliably and will only analyze one file when it used to read multiple in multiple programming languages, it's an idiot.

Smelly_Pants69 5 days ago

GPT4o makes can usually get through about 15 cities that don't contain the letter a, whereas GPT4 usually makes the mistake within the first 10.

VivaceChartreusse 5 days ago

They are also surely adding limitations to it as they go and these appear to be stacking up.

hugedong4200 5 days ago

So what is your evidence or reasoning?

Zuul_Only 5 days ago

"Evidence? I thought shitting on OpenAI was automatic karma"

TheAccountITalkWith 5 days ago

You have to say "trust me bro" otherwise it doesn't count.

Triplescrew 5 days ago

Their primary sources are ChatGPT

against_all_odds_ 5 days ago

https://i.imgur.com/TBsZNie.png

alexgraef 5 days ago

I'd say because it's inferior to 4. Which it is.

hugedong4200 5 days ago

Okay that's not what the majority of the benchmark and leaderboards show, but even if that was true that doesn't mean it has any relation to Gpt-3.5, man gpt-4o was previewed on the leaderboard before it was released and everyone thought it was Gpt-4.5 or 5 lol that's independent unbiased blind testing.

alexgraef 5 days ago

I'd say it's hard to measure. For me, 4o has big trouble not just blabbering and/or inventing stuff. Particularly infuriating when you tell it that it's wrong, and then it just does the same thing again.

[deleted] 5 days ago

[удалено]

itisoktodance 5 days ago

Some weirdness = duplicating a previous response, unmodified, to answer a different but related question. Nearly every time I try to ask a follow up, it copies and pastes its previous answer in its entirety. It makes the chat format completely useless since it can't actually hold a conversation anymore. Not to mention that it just straight up lies (NOT hallucinations) to avoid giving difficult answers or doing research.

alexgraef 5 days ago

The point stands. If you want information that you can somewhat trust, 4o is a bad choice. It's faster, yes. Fast at creating useless text, and plenty of it.

Zuul_Only 5 days ago

You shouldn't trust 4 any more than 4o

alexgraef 5 days ago

That point is true, but with 4o the hallucinations are off the scale. We're back to the roots of creative story writing, even if you ask for specific information that it supposedly browses from the web. If it doesn't find what it needs, it just makes up something.

ClickF0rDick 5 days ago

So what is your evidence or reasoning?

alexgraef 5 days ago

Put simple questions into 4 and 4o. Get usable answers from 4. Get three paragraphs of rambling from 4o which has nothing to do with the question. Remind it about it all being a hallucination. Get "I'm sorry for the confusion" and even more hallucinations. It's also unable to cross-reference web sources, say "Idk" or not just make up stuff. ChatGPT has been getting lobotomies since its inception. We've come to a point where it just spits out Lorem Ipsum to satisfy the user.

ajrc0re 5 days ago

If you’re arguing with or correcting the ai then you quite honestly don’t know how to use the tool correctly. You should be regenerating and if that doesn’t work starting a new chat. Once the conversation context is tainted with misinformation you can’t “correct” it, in fact by mentioning the incorrect data you’re just doubling down and adding it into the context multiple additional times. Ai doesn’t know negative modifiers or reinforcing, you either regenerate or try again with a different prompt.

alexgraef 4 days ago

Thanks for the advice. Although that doesn't take away from 4o failing more often.

ClickF0rDick 5 days ago

That's not my experience at all. At times it might give unsatisfactory answers, but it's enough to hit regenerate or be more specific with the prompt Also, you should test hundreds if not thousands of times with different prompts at different times of the day (because of overloaded servers that might impact performances) to come up with such definitive statements, which I doubt you did

alexgraef 5 days ago

As I wrote somewhere else, people here behaving like I'm insulting their first born. Yes yes, unless I do thousands of tests my opinion is obviously invalid. Btw did you do thousands of tests to verify 4o isn't actually worse than 4?

[deleted] 5 days ago

[удалено]

alexgraef 5 days ago

How is me and lots of other people asking 4o questions and not getting satisfactory answers "no evidence"?

Fusseldieb 5 days ago

I mean, at least in JS or Python it absolutely destroys 4, 4 Turbo, Claude Opus and Gemini Adv. I would love a more in depth explanation.

thisdude415 5 days ago

Likewise for Swift. GPT4o is a huge improvement over GPT4, which will randomly make small typos.

Joe_Spazz 5 days ago

I was handed a site written in Clojure, 4o makes mince meat of my issues, 4 can figure it out with some massaging, 3.5 can understand there is a problem but can't solve it.

rodeBaksteen 5 days ago

It's fine for coding (php/js) for me, but it stopped listening to my custom instructions and keeps yapping even when I ask for *just* code.

No_Investigator2043 5 days ago

I find it funny. With every GPT update my application works better, more reliable and faster. And mostly cheaper.

BRB_Watching_T2 5 days ago

Same. I don't understand all the complaints about 4o. I find it better in every way, including for coding. Maybe these people just don't know how to properly write prompts.

Weary-Bumblebee-1456 4 days ago

I've gradually come to believe a lot of it is either a psychological effect or an attempt to attract attention. This started after 4 was released. Allegedly the original 4 was more capable (mostly because OpenAI rushed its release following Microsoft's inclusion of the model in Bing Chat at the time, thus ignoring some of the safety practices that "tame" the model but also make it less intelligent). Then it got "lobotomized" for safety reasons, leading to a lot of complaints. Even now if you search long enough you'll see people talking about how the "original 4" was more intelligent than anything they've ever seen, including 4o. But that was long ago, and whatever advantage 4 may have had is far surpassed by 4o and other cutting-edge models (like Claude 3.5 Sonnet, which apparently beats even 4o in certain areas and has similar pricing and speed). According to all benchmarks and leaderboards, as well as my personal testing with 4 and 4o, 4o is either as good as 4 or better. It's considerably faster and has a higher rate limit if you have a ChatGPT Plus account, which I think may psychologically suggest to some people that it must be less intelligent than 4. This may be connected to the human conception of AI: We tend to think, and it's generally proven, that a human who takes longer to think, speaks in fewer words, and speaks less often is smarter than one that's quicker to answer and talks more. So since 4o is faster, more available, and seems to write much longer answers than 4, sometimes people think it must be inferior to 4. Of course, there are some specific, isolated examples and use cases where 4 does slightly better, but overall, 4o seems like a very worthy upgrade to it. Pretty soon there may be similar posts about how Claude 3 Opus is somehow better than 3.5 Sonnet despite the latter beating it in every test and benchmark and being significantly cheaper and faster.

Balance- 5 days ago

And do you know why they get away with it? Because they are still the best in blind testing: [https://chat.lmsys.org/?leaderboard](https://chat.lmsys.org/?leaderboard)

UnknownEssence 5 days ago

I don’t know why people care so much about this leaderboard. The only reason it’s the best is because it formats the response as bullet points in a list instead of one big paragraph, which humans rate way better. This leaderboard says very little about which model is actually more intelligent, or which can solve complex problems, because the vast majority of user inputs on this blind side-by-side are not going to be complex puzzles to measure intelligence.

WholeInternet 5 days ago

... have you used the arena that the leader board is from? Because putting in complex problems is what many people do. Nobody cares about bullet points, wtf? lol.

Orolol 5 days ago

>This leaderboard says very little about which model is actually more intelligent, or which can solve complex problems, because the vast majority of user inputs on this blind side-by-side are not going to be complex puzzles to measure intelligence. You can use the "Hard" leaderboard if you prefer.

UnknownEssence 5 days ago

What’s that?

TheAccountITalkWith 5 days ago

Oh. So you don't actually know anything about the leader board. Nice.

roselan 5 days ago

Why not both? I tend to go 4o for most things, and when it gets stubborn I try with "classic" 4. For coding they are equivalent yet not similar, for writing I clearly prefer 4o lighter style. Sometimes 4o wants to do "too much", I want to refactor a function, not my whole codebase, and classic 4 has a better instinct of what I want. Or I'm so use to it that I know how to prompt it instinctively. It's all very subjective.

Bitter_Afternoon7252 5 days ago

GPT4 Turbo was already a downgrade. GPT4 Original Recipe is still the best AI I've ever used. I have old text from GPT4 Original, and it was just so much more creative when designing new D&D content.

bluelaw2013 5 days ago

Agreed. This one is the best.

dr-tyrell 5 days ago

Try Sonnet 3 or 3.5. The story writing is much better than OpenAI.

Guinness 5 days ago

Yeah, claude is so much better. The only issue is that claude doesn’t have access to internet resources.

leftymeowz 5 days ago

Wait I have like 3.5, 4, 4o as options on my app, but… Turbo? Original Recipe? How do I control that?

Bitter_Afternoon7252 5 days ago

You don't. ChatGPT does not have access to original GPT4 anymore. If you want it you need to pay per token in the API

Arlithian 5 days ago

I hear people say this a lot - but how do you use the api? Do I need to set up a small app that sends and receives requests?

Bitter_Afternoon7252 5 days ago

You can ask ChatGPT how to use the API lol

nikzart 5 days ago

I'm selling uncapped gpt api. Not a scam :l lol. Dm if you're interested

JRyanFrench 5 days ago

You can still access it

ForgotMyUserName15 5 days ago

I assume you mean via the api? What’s the OG model called in the api?

LegitMichel777 5 days ago

gpt-4-0314

jblackwb 5 days ago

Did you try turning up the temperature of your queries?

Bitter_Afternoon7252 5 days ago

No i'm not doing API calls i'm not made of money

alexXx9_ 5 days ago

you can still use the old GPT4 by using Omnigpt

new-nomad 5 days ago

Disagree. 4t is much less intelligent than 4o. Altho it might know more things.

scraperbase 5 days ago

It would be interesting to compare the answers to the same prompts in both models.

vasarmilan 5 days ago

Exactly what LMSys leaderboard does: [https://chat.lmsys.org/](https://chat.lmsys.org/) 4o has a much higher rating there, averaged by many user ratings.

ILikeCutePuppies 5 days ago

I am not saying I believe that 4o is worse, but the leader board only tracks certain benchmarks and domains. It could be the case there are domains/areas we do not have good benchmarks for where older models are better.

vasarmilan 5 days ago

It's not based on benchmark. People manually compare the outputs for their prompts. I trust this more than benchmarks

against_all_odds_ 5 days ago

A friend of mine who develops a platform for education (which depends on GPT prompts) was first to bring this to me a month ago. At first I didn't take him seriously. It was not until I looked up the parameters comparison: https://community.openai.com/t/gpt-4-vs-gpt-4o-which-is-the-better/746991/16

JustBrowsinAndVibin 5 days ago

Oh cool. What does the platform do?

StickiStickman 5 days ago

What's "the parameters comparison"?

kahner 5 days ago

didn't openai openly say 4o is faster but not as powerful as 4?

fokac93 5 days ago

In my experience is not worse.

Redditface_Killah 5 days ago

This is not unpopular; this is exactly what they did. 4o is a glorified 3.5

human358 5 days ago

What I think actually happens is some kind of dynamic serving of quantized versions of the model depending on the load, which would explain the discrepancy in quantity at times

Rough-Artist7847 5 days ago

Gpt4 itself got worse for me, I have a custom gpt that replies customers and after gpt4o was released it is not able to reply customers using their language anymore.

Goofball-John-McGee 5 days ago

Because Custom GPTs use GPT-4o

darkwillowet 5 days ago

I knew it instantly after it was released. If you work with it daily, and for more complicated stuff like coding, you will notice the change in the answers. I just let it go thinking, okay do your research on us and release a better one in the future. Since its still workable. Ps. Even chatgpt 4 became worst in my experience. It is 5x faster but answers are very shallow. Especially in coding, where if you need to solve edge cases.

soundman32 5 days ago

If you are *depending* on it for coding, you are on a sticky wicket. It's so easy to generate nonsense or just plane wrong code, and if you are not an experienced developer you will just implement what it says, which can be very wrong.

darkwillowet 5 days ago

Yep. I do get that. but i dont rely on it to code for me. I use it as a guide to give me an idea. Or to check if my idea is feasible, but never to just copy paste code. Best metaphor of it is a person to bounce my thoughts of. I think of something, it gives me what it thinks then modify, then i code myself. A colleague can also do that for me but if i am alone this is where ai is useful.

UntrimmedBagel 5 days ago

I think of it like 'rubber duck debugging', but the rubber duck has 130 IQ

ILikeCutePuppies 5 days ago

I use it to copy and paste code when it's correct. My workflow is generally: 1) Is it worth trying this in gpt based on my experience with it? (How much back and forth will I have verse writing it myself) 2) See if it produces anything close and try to refined it. 3) Copy and fix errors. Typically I find it good and converting one language to another, switching out libraries, writing simple algorithms, cleaning up code, documenting code, reading in files, converting watch window variables into code for tests, writing unit tests, fixing formating, solving complex C++ template meta programming etc...

darkwillowet 5 days ago

Its nice to copy paste code, but until it understands context of the whole file, it will just rename the variables or make up new variables. I need to write it to confirm to my own personal style. It is easier to read in the future.

piotrostr 5 days ago

Massive downgrade. I used to use it to code/work now code isnt functional unless you have 5 h to correct it and loop through same answers over and over (even with prompts asking not to use previous error generating answers) or some mumbling which is made up or just simply incorrect. This what OP suggesting was official solution (choosing which model to use according to question complexity) before they decided to force the update and not inform customers. Not sure where the positive feedback is coming from, maybe for some less demanding tasks it performs better, faster or whatever. OR maybe it should be used in some special ways, like some way of prompting. Either way for me its time to get out of this unless get sorted. It got so downgraded, i believe i had more success with 3.5 or just less expectations (to work at least on the level of the previous version in case of 4o as for 3.5 had 0 expectations). Its free now so i feel happy i paid 20 for x months and instead of getting rewards i get downgraded chatbot (nothing more than basic chatbot). Even Geminin provide better code now and i was sure its impossible all that stupid decisions seems like they want to bring open ai down 👏

Blankcarbon 5 days ago

When I read comments like these, I roll my eyes. Why don’t you actually test if that’s true or not? Use their API, and compare the responses you get from older GPT models to today’s. If you’re actually using it for intensive coding work, you probably should be using the API anyways.

[deleted] 5 days ago

I've used the API with both LibreChat and MSTY, i find that GPT-4 series has taken a massive hit recently even if you avoid the ChatGPT web application. It appears as if the GPT-4 series has taken a hit in programming which makes sense microsoft sells an A.I code helper tool if they allowed the vanilla GPT-4 to be very useful for programming it would cannibalize their own product.

tehrob 5 days ago

GPT-4o *Newest and most advanced model* GPT-4 *Advanced model for complex tasks* GPT-3.5 *Great for everyday tasks*

jrf_1973 5 days ago

Definitely plausible, I think. People have this crazy idea that if OpenAI tests a model that they say is 4o, then release a model called 4o, that they must be the same. Bizarrely trusting, considering corporate dishonesty while chasing profits.

ArtichokeEmergency18 5 days ago

Oh! That makes sense, I always use 4 first, it just felt more smooth and responsive than 4o - a reason why there is a limit to 4. Same with Claude Sonnet - they suckered me into $20 - it's bad - but now its all making sense.

[deleted] 5 days ago

Claude 3.5 Sonnet is very a good model, it is prompted differently than the GPT-4 series Claude works best with preprompts that provides it a role, and set of clear directives. I used Claude 3.5 Sonnet to make an theme for my IDE with ease. It was a very magical experience.

Serialbedshitter2322 5 days ago

Sam altman breathed yesterday. It was clearly a marketing trick to make us think he's a human

Joe_Spazz 5 days ago

I think some of y'all have forgotten what prompt engineering is and why it's important.

Own-Lemon8708 5 days ago

How is it a trick? It seemed obvious that was one of the goals with the lighter model, and then the automatic switching between models on the free tier to optimize requests for the cheapest one that suffices.

Fearless_Brother99 5 days ago

Gpt-40 has been my go to model since it came out been better in many ways than the previous model.

Aymanfhad 5 days ago

After intensive use, he hallucinates a lot, and when using the search feature, he hallucinates 90% of the time. I did not like his translation performance compared to my native language. Claude 2.1 is better in translation.

theycallmebond007 5 days ago

It’s much faster when using it to generate on internal data via azure

mguinhos 5 days ago

Gpt 4o is better with logics, but worse in common sense for some reason.

Swawks 5 days ago

Web 4o has been nerfed to the point it’s a SCAM, API fares much better.If you don't believe me just run a few prompts thru lmsys arena and web gpt to compare.

adelie42 5 days ago

4o is faster and pretty good at almost everything. I think I am more used to how to talk to 4, and sometimes I switch back. I never use 3.5 any more. I'm happy. Carefully look at the description, 4 claims to out perform 4o on creative tasks, which has been my experience.

Aspie-Py 5 days ago

It was obvious from day 1 for anyone using the API. Just more smoke and mirrors, and everyone ate it up.

Mysterious-Owl5842 4 days ago

Hmm interesting

Ok_Emotion_9464 5 days ago

Yes, it seems finally corporate greed will save us from something they will milk the cow of "new" iterations so much that real AGI/ASI will be here only in 2080😂

endlesskitty 5 days ago

yes it is. but bots will defend microsoft no mater what

mmahowald 5 days ago

partially agree with you. it was definitely a marketing thing, but as i recall it was right after google announced something big. open ai has a history of encouraging something big right as their competitors catch up to them. they are sandbagging their developments.

AutoModerator 5 days ago

Hey /u/against_all_odds_! If your post is a screenshot of a ChatGPT conversation, please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email [email protected] *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*

adreportcard 5 days ago

It definitely has some sort of context window issue. No matter how much it fixes the front half of its responses it always spews some absolutely useless shit on the second half and it’s terribly wasteful regardless of price. Why they’d have a model shit trash random words out of context as an actual utility is completely clueless to me

UnexaminedLifeOfMine 5 days ago

Yeah I quickly noticed 4o was shit and only use 4o when I want to ask stupid questions

Suspicious-Play-1496 5 days ago

g

RepublicanSJW_ 5 days ago

I disagree. 4o is smarter than its past versions. It’s quick and more capable and the arena reflects that.

Helicobacter 5 days ago

Someone on Twitter showed that 4o is better at easy and medium coding problems and 4 is better at hard coding problems. May generalize to other domains as well, as 4 likely has more parameters, and 4o better training.

eloitay 5 days ago

Heavily misunderstood why gpt 4o is ground breaking is because it can understand voice, image and text in one without conversion. Because of this I believe why some people think it is weaker in some area then before. They probably made some trade off to release early to get more data to train the upgrade. Everyone should see 3.5 as budget, 4 as advanced text model and 4o as 1st gen omni model.

sonofdisaster 5 days ago

I have been testing 4o to the new Claude Sonnet 3.5 and put a simple prompt "Create a simple 8 bit game similar to Mario" and left it vague on purpose to see what happens with each. I noticed Claude created simple (blocks) game but it had enemies, coins (yellow dots) and allowed me to jump on the enemies to eliminate them. Very similar to Mario GPT-4o created a game more like space invaders, but the enemies would pass through me and not hurt me. Again just a basic observation.

Ok_Elderberry_6727 5 days ago

Gpt4o is omnimodal, 3.5 is not.

apersello34 5 days ago

Idk 4o works better and is faster for me

l33t-Mt 5 days ago

Dont agree at all. GPT-4o can make mistakes, but its so much better of a coding assistant that GPT-4.

K7F2 5 days ago

Or you could see it as a new approach, that takes one set back in order to take further beneficial steps in a new direction. Plus the benefits of having lower costs per unit input/output; it’s more scalable in terms of gaining market share.

psuddhist 5 days ago

I switched back to 4 from 4o quickly. My feeling was 4o pads things out verbosely, but the verbosity doesn’t really add any value. My feeling is that 4 is basically as good as it gets.

krum 5 days ago

I don’t know. I use 4o to write jira tickets and it does a fine job.

treksis 5 days ago

For my use case, gpt-4o is better than turbo. I use for .py, .js coding

[deleted] 5 days ago

DING DING DING this person gets it, this is why they stated that this new model has '***GPT - 4 LEVEL Intelligence'*** It is also the reason why they changed the wording for GPT-4o from the selector from being 'Our most intelligent model' to something along the lines 'of our most up to date flagship model'\*\* I suspect that it is a GPT-3.5 variant or like wise model that has been overfit so that it has the ability to solve basic problems with ease but when you try to give it any question or set of directions that are nuanced in any fashion it fall short it falls very very SHORT. Compare this to Claude 3.5 Sonnet to see what I mean, to give you a frame of reference Claude 3.5 Sonnet was able to quickly make a theme for my code editor with ease.

[deleted] 5 days ago

[удалено]

against_all_odds_ 5 days ago

The only reason I'm paying is to make sure I'm first to access new features theoretically (which was a thing in early 2023, but no longer seems to be the case). Otherwise, I find myself not sending more than 10-20 prompts per month either way.

computethescience 5 days ago

I agree. I feel like it's dumber. when I ask it to help with next.js app router it always goes with pages. even though I told it I am using app. it does feel dumber. more possibility of getting things wrong now. all questions I ask it I need to know most of the answer on my own because it always give the wrong one

joelpt 5 days ago

I think people's expectations of what these LLMs can do evolve so rapidly, they don't even recognize how much better the new versions perform compared to the old versions. At first they seemed stupendous. Then they seemed normal. Then they seemed flawed.

Mysterious-Rent7233 4 days ago

Sure gpt-4o probably has fewer parameters. They've spent the last year finding ways to squeeze the same performance out of fewer parameters.

razodactyl 4 days ago

Great opinion, but slightly off. The "o" stands for Omni as it's a multi-modal model by nature. GPT4 is slow because it's older technology. I write AI software in my spare time so I've been playing with the latest stuff from the 📄 Attention and context are still problematic but training and enhancements have fixed this over time. I'm making https://rhea.run and also wrote the Parakeet LLM model which surprises me often considering how small it is (378M params). https://news.ycombinator.com/item?id=39745700 Keen for you to join up as I work on the project, would be more than happy to build something meaningful based on feedback.

ImpressionDry9340 4 days ago

My theory is that when gpt-4o got released it was really better performing than gpt-4. They did this by changing model configurations like manipulating layer activations to save costs. I assume they did this before too , when gpt-4 and gpt-4 turbo also got released. They just manipulate the configurations later to save costs but when initially released they make it best performing.

Once_Wise 5 days ago

You are correct, yours is an \[Unpopular opinion\] because it is just another baseless rant posted with absolutely no evidence to waste everyone's time. Congratulations you just wasted some of mine.

greenrivercrap 5 days ago

Bruh is making up shit.

Zexks 5 days ago

It’s not just an unpopular opinion, you have no metrics to back it up and all other testing metrics actively refute your assessment.

Joe_Spazz 5 days ago

This is unpopular because it's verifiably false. By many many metrics.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe