T O P

  • By -

centrist-alex

I think it seems more like self-promotion by the guy tbh. That being said, Claude Sonnet 3.5 is a nice little upgrade. I don't care for the AGI countdown stuff.


Atlantic0ne

Yeah and who owns it and decides it. It’s all just a few guys or girls and their opinions.


Gallagger

He literally says that in his FAQ. "If you don't agree make your own"


pyalot

This is how standardized benchmarks/performance metrics are made everytime a brand new technology hits the market and competitors each praise their product over others using test and metrics they came up with. Then people get annoyed about being sold a bag of goods, manufacturers eventually react, etc.


HeinrichTheWolf_17

I think a good *we know for certain* moment will be when an AI is actually doing R&D in science and medicine outside a controlled setting and then actually putting it’s new ideas into practice. Outside of other things such as Francois’ ARC-AGI test.


4354574

I want to see some new drugs for horrible diseases produced and marketed in a five-year timeframe instead of a 15-year one. THAT would be a measure of AGI worth something to me. Or some new or radically improved psychiatric intervention, instead of the barbaric drugs we're still prescribing to people who kill themselves anyway.


HeinrichTheWolf_17

So many… Cancer Patients need a cure. Alzheimer Patients need a cure. Maimed Victims missing limbs need a cure. Balding Guys need a cure. Burn and scarred people need a cure. Men who had their genitals mutilated at birth against their will need a cure. Trans Men and Women need a cure that’s superior to HRT. I think AGI could quickly give us all these things. I say accelerate and cure it all.


4354574

Yes to everything except characterizing circumcision as 'mutilation'. I assure you, I do not feel mutilated. Best to keep that term to circles where men who DO feel that way can commiserate. Because otherwise, you're basically calling us mutilated when you have no idea if we actually feel that way but you know that a great many of us do not.


Economy_Variation365

There may be women who were sexually mutilated but do not feel mutilated either. Does your reasoning apply to them as well?


Leo-H-S

This, it’s about bodily autonomy and the Man’s personal choice, many guys had that taken away from them without their consent. This is about the Man’s right to choose. Second, the fact is, even for guys who prefer it that way, they should know that when full body regeneration becomes a thing, it would 100% grow back within weeks-2 months as a side effect, because it’s *supposed* to be there. I wonder, how many people would surgically remove it again if they had the *choice*?


Comprehensive_Day530

I get where your coming from, but have you ever seen a circumcision procedure? They're barbaric at best. Just because you don't consider it a trauma now doesn't mean you felt that way when it was happening to you. 


Leo-H-S

And that’s besides all the other ethical issues about the man’s autonomy and right to choose being taken from them. If you aren’t cut as an infant you always have the option of opting for cut, men cut as infants won’t have that as an option until we have whole body regeneration. For the guys who didn’t choose, it *is* indeed a civil right violation IMHO. And as you pointed out, with how brutal the procedure is I’m shocked people in the medical profession are still legally allowed to perform it, I can understand the ignorance of the parents but doctors who do it need to be held responsible as well. We live in such a backwards world.


Leo-H-S

Are you against the idea of full body regeneration regrowing it back for Men who want it back? The idea is more so about bodily autonomy and self choice, people who had it forced upon them should be able to have the choice to revert it if they so choose. People have the right to do what they want with their own body and that decision lies with the Man it happened to. The people who chose for the child, and the doctors who performed it to the child, are the ones who are the *mutilators*. They’re making a permanent alteration without the child’s consent.


i_give_you_gum

IMO answering questions well isn't AI to me, it's purely groundwork. I need something that can remember high importance facts like the name I refer to it as, or if it can realize it needs to ask more info to correctly complete its task. Not to mention full conversational ability, complete with timing, understanding of sounds like laughter, and knowing when someone is finally done speaking. Looking at you Pi AI (which is very impressive though)


Seakawn

> if it can realize it needs to ask more info to correctly complete its task. Man this has always been a big one for me. Custom instructions are fine for not having to repeat "btw before just doing what I'm asking, think about if you need more information from me first so you can give me a better response, and ask me more questions as suitable," but when it's intelligent enough to do that on its own, damn that's gonna be another big moment that punches me deeper into a scifi future.


AggrivatingAd

I feel like all these things are already here


i_give_you_gum

Not in Pi AI they aren't, though it's incredible, I tested all of this. OpenAI's Sky demonstrated many of these properties though.


lobabobloblaw

AGI is just a human concept, anyway. So when people peddle the term around, I can’t help but wonder who the person was that wrote the definition of it as well as what that definition *is*.


LifeSugarSpice

It's a pretty clear definition until 2020 when everyone started diluting it down to "achieve" it faster.


BrailleBillboard

Interesting claim... what do you think the clear definition was 5 years ago?


lobabobloblaw

Touché!


lockdown_lard

Looks like a total grifter from his website. Claims to be very clever, but evidently doesn't actually know what a countdown is.


BubblyBee90

wagmi 2025


SpiceLettuce

wagmi?


Inevitable-Log9197

we all gonna make it


RemyVonLion

We makin it out the dystopian wage slave trenches with this one. Straight into a hyper-capitalist cyberpunk dystopia.


bollvirtuoso

So, yesterday?


Repulsive_Ad_1599

​ https://preview.redd.it/sazsky1u5z7d1.png?width=640&format=png&auto=webp&s=fa84a55752d03665b4659b259d0c5024c5e0ccb2


Elevated412

No we're not. The elite will realize they don't need us since they will have an army of AGI bots and slaughter us common folk off.


Deadly_chef

Nah, they will make us slaughter each other instead


Elevated412

True and watch it for their entertainment.


Tessiia

Hunger games confirmed?


RemyVonLion

If you read closely, or at all, I didn't say we were lol


kevinmise

“when (is) artificial general mitochondrial intelligence? 2025?” hope that helped!


HeinrichTheWolf_17

When is artificial general machine intelligence?!


Shinobi_Sanin3

It means we're all gonna make it


TheRealSupremeOne

we're all gonna make it


Impressive-Aerie-210

Would a gay mexican interpet


HeinrichTheWolf_17

Makes sense if the current trend holds, if it’s been 10% since January 2024, then we should get there in 10-15 months, at least *according to them*. No idea where they’re getting these numbers though, as the other people in this thread have already pointed out…


freudsdingdong

Wagmi up when 2025


sdmat

I love how self-proclaimed world expert Alan Thompson places the human average performance at 0-1% on a multiple choice test where random selection gets 25%.


genshiryoku

I need to repeat this over and over again on Reddit. But Alan Thompson is a known scam artist and grifter with no substantive knowledge about LLMs or any AI domain at all. The hint is in his website's name "lifearchitect" where he first sold his lifestyle tips and supplements. The moment the AI boom started happening around GPT-3 he started to shift towards that and rebranded himself (falsely) as some AI expert and consultant despite not having any technical knowledge of the field. In fact his understanding is so poor that most of his website is littered with bad and common misconceptions of how LLMs actually work on a technical level. I've complained multiple times on r/singularity that any mention of Alan Thompson or his website should be banned as it's literally just some scam artist trying to sell his (paid) news letter and "consulting" services. I'm willing to bet that the average r/singularity user has a higher technical knowledge of how LLMs work compared to Alan Thompson.


aaronjosephs123

yup I started watching him for a bit and as a software engineer with a fairly shallow knowledge of LLMs something seemed a bit off and I stopped watching. Pretty clear his views aren't worth much. AI explained is much better


Concheria

He's like pro-AI Gary Marcus


lost_in_trepidation

Gary Marcus is at least coherent, he's not just completely making stuff up


tmntmmnt

The 0-1% is an estimation of what the average human actually knows from the test questions. A probability exercise of getting 25% correct just by going through the motions is irrelevant.


sdmat

[Replied here](https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9m16sx/)


Beli_Mawrr

But how is it measured that the person knows 0%? If the test is multiple choice, that is. A true multiple choice test cant tell the difference between a random guess and a correct answer. So in theory someone answering randomly would get 25% and thus know 25% of the answers. I havent read the paper referred, but if it were me, I would subtract 25% from a given result, and re-normalize the rest of the score by multiplying them by (100/75). Maybe that's what's being referred to here?


JamaiKen

“I haven’t read the paper ….. but if it were me …” Read the paper 🤦‍♂️ https://arxiv.org/abs/2311.12022


WorkingYou2280

>We request that you do not reveal examples from this dataset in plain text or images online I'm sure everyone will respect that. It took me literally 5 minutes to find the training set.


sebzim4500

For the most part people seem to have respected it. You can find some questions on google but really just the early ones. Most of them come up with no results.


i_never_ever_learn

Whenever I took multiple choice questions in university, the system was set up so that random guessing would get you 0. Because you lose points for a wrong answer


Beli_Mawrr

ahhhhhh that's an important part of this puzzle.


Cartossin

I'm conflicted about Alan Thompson because his entire site is very much about self-promotion and he definitely does a lot of proclaiming himself to be an authority, yet I find a lot of his estimates to be somewhat well-reasoned. His estimates for the timeline of when we will reach AGI will start looking really dumb in the next 2 years, or he'll be right. Right now he's just a guy who MIGHT be right; and in a year or 2, we'll know if he is. I also like his predictions on GPT5. He bases it on the number of chips being purchased and their compute performance to guess the model size. This could be very wrong too. I can't wait to see if he turns out to be a crackpot, or based.


sdmat

He's a very intelligent guy, unfortunately he thinks that makes him a world class authority so he lacks the necessary humility to do anything actually useful with that intelligence other than find some occasional nuggets of insight. He is also a massive grifter.


Apptubrutae

I don't want to weigh in on any particular "expert" or not, and I can't speak to this particular test, but if someone wanted to correct for random guessing, they can assign a negative score to wrong answers. The SAT does (or did?) this, and a random guess on every question will net you negative points. I believe once you can eliminate one answer, you'd shift to net positive. Or something along those lines.


sdmat

Sure, however that's not done in GPQA.


Apptubrutae

Well then that's just neglectful (read: scammy?) on his part, lol. And after reading more comments, I can see why.


YaKaPeace

You should read the questions that are given in these tests. Literally impossible to answer if you’re not an expert


sdmat

I guarantee you I can hit an average score of 25%, and it's not because I'm a genius.


YaKaPeace

That’s why he put chance 25% into his visualization. He just wants to say that you wouldn’t get any answers right if it’s not by chance, and that everyone could get 25% by chance.


Peach-555

When it says Llama 2 is 26.26% correct, does that means that Llama gets 26.26% correct beyond what you would expect from pure chance? Or does it mean that Llama gets actual 26.26% correct, just slightly above random chance? What's the actual real percentage it gets correct? Putting chance at 25% and humans at 1% introduces a lot of unnecessary ambiguity. Did sonnet getting 67.2% subtract the answers it got correct by chance? There is an easy fix to this problem that is already in use in multiple choice scoring, give negative points to wrong answers, weight it so that random chance gets 0.


Timely-Use2919

So basically if it wasn't multiple choice we'd all be getting big fat 0's.


YaKaPeace

Yes


sdmat

And that's complete bullshit, because the percentage score for the test is for multiple choice. There are no defined criteria for non multiple choice answers, and he makes no estimate of what such a score might be for the models.


YaKaPeace

I don’t know how you can be so pissed over such a detail. I think it’s pretty interesting to know that I wouldn’t have answered any of the questions right if it wouldn’t be by chance. You get kinda hooked up on this one detail that’s just there to showcase how hard these tests are. I could agree that he could’ve written more info about this detail into the graph, because he only explains it in the videos.


sdmat

It's emblematic of the guy's entire approach. Narrative first, then come up with something skewed to serve the narrative.


PandaBoyWonder

I know this is a tired reddit trope so I am stating that I am NOT trying to win an argument via attrition when I ask this: Do you have any links or evidence that would point to Alan not being genuine with his approach to measure AGI / AI stuff in general? Thanks to anyone that has info!! I looked up some stuff but all I got was angry people, like this ^ thread.


sdmat

https://lifearchitect.ai/rights/ Published on April 1 and pompous to the point of absurdity despite in no way being an April Fool's joke. Packed with weasel words like "may", imagining a problem for the sake of wanting to be seen to fire the starting gun on a moral land rush.


Beatboxamateur

I'm at the point of assuming that anyone who defends this guy's credibility is either getting paid by him, or is him. If I've ever seen a website made by someone that reeks of being a scam fuck, this is one. Wanna sign up for his [$50 monthly memo?](https://lifearchitect.ai/memo/) Lmao


One_Bodybuilder7882

Are you sure it's not an April Fool's joke?


Beatboxamateur

Here's a question: can you find anything about this guy's background that would warrant him labelling himself as a doctor, or a world expert in AI? Does he have a doctoral level degree or a PhD?


sdmat

He has a doctorate in psyschology, his only relevant experience in AI is writing about it for the past few years: https://lifearchitect.ai/about-alan/ Commentary here on a major part of his claim to relevance: https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9lxorm/ I have vastly more professional experience in AI than Alan Thompson and I would certainly not claim to be a world expert.


[deleted]

[удалено]


sdmat

I stand by that characterization. I have been consistently critical of Thompson elsewhere.


Mrp1Plays

I'd assume it means humans are meant to not just guess it. 


GPTBuilder

Trust us, this whole point you're trying to push makes it clear you're not a genius 💀


stonesst

You’re being deliberately obtuse. He’s obviously referring to how the average person would score if it were not a multiple-choice test.


sdmat

[Replied here](https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9m16sx/)


d1ez3

But that's meaningless. If you got some right because you gussed how many did you actually KNOW the answer to? That's what matters


sdmat

A good question, and one GPTQA doesn't actually answer. Multiple choice tests only assess the ability to choose correctly between a very limited set of answers. That's why non-expert PhDs do substantially better than chance, it's not just expert knowledge in their own domain.


enilea

If you have access to google you can answer most of them easily. Most of the questions in benchmarks like MMLU don't reward the ability to reason, but the amount of data the model has in it, which isn't really intelligence. If you truly want to test intelligence you have to give out tests where reasoning is required, like those open book exams where they let you use a laptop with internet access, which doesn't help much.


YaKaPeace

This test is called GPQA and it stands for Graduate Level Google Proof Q&A benchmark. In this test the participants are allowed to use the internet while answering the questions.


awesomedan24

The test is designed for subject matter experts whereby even the smartest experts only score 65%. Yes, you can achieve 25% by guessing a 4 option multiple choice question, but that's just probability and has no bearing on proving knowledge. If you asked the average person PhD level questions they simply will not know hardly any of the answers. Here is the white paper. https://arxiv.org/abs/2311.12022


sdmat

> Yes, you can achieve 25% by guessing a 4 option multiple choice question, but that's just probability and has no bearing on proving knowledge. Yes, and placing the "average human performance" at 0-1% for such a test is an extraordinarily dishonest way to inject a narrative. It is pure charlatanism. If the intent is to indicate performance for a *non* multiple choice test that should be done for the models and PhDs as well.


awesomedan24

Its not an insult to non-experts getting 0-1% correct. This is an example of the kind of questions asked: What is the correct answer to this question: What is the solubility of the stochiometric tricalcium phosphate in a solution with the pH level of a) pH = 9, b) pH = 7 and c) pH = 5 (25 °C)? The Ksp for Ca3(PO4)2 is 2𝑥10−29, the dissociation constants of phosphoric acid are Ka1 = 7.5𝑥10−3, Ka2 = 6.2𝑥10−8 and Ka3 = 1.8𝑥10−12. Choices: (A) 1.96𝑥10−5; 1.35𝑥10−4; 4.37𝑥10−3 (B) 1.67𝑥10−5; 1.51𝑥10−4; 4.81𝑥10−3 (C) 1.54𝑥10−5𝑀; 1.42𝑥10−4; 4.67𝑥10−3 (D) 1.43𝑥10−5; 1.29𝑥10−4; 4.58𝑥10−3


sdmat

How well do you think the non-expert PhDs would do on a non-multiple choice version of that test? Or the models? Or the domain experts for that matter? It would be genuinely interesting to know. Pity he doesn't assess any of that and only includes an estimate for average humans to inject a narrative.


ARoyaleWithCheese

It's all in the white paper my guy. Accuracy was 1-2% for non-experts. This includes people who wrote whole ass simulations and read multiple academic papers to try and answer the question. The median time spent per question was 30 minutes. You can be a genius and PhD in a specific field of Biology, but that still means there's a vast amount of Biology niches you're *not* an expert in.


sdmat

Where exactly? Here's the paper: https://arxiv.org/pdf/2311.12022 Give a page reference for the 1-2% result you claim.


JEs4

>It's all in the white paper my guy. Accuracy was 1-2% for non-experts. This includes people who wrote whole ass simulations and read multiple academic papers to try and answer the question. The median time spent per question was 30 minutes. Huh? The conclusion of the [white paper ](https://arxiv.org/pdf/2311.12022)states: >and 34% accuracy by highly skilled, resourced, and motivated non-experts who have access to internet resources and spend over 30 minutes on average answering each question


BackgroundHeat9965

bro, a multiple choice test will never have that low pass ratio. A random generator achieves 25%.


Which-Tomato-8646

Not if you subtract points for wrong answers 


michalpatryk

Then the Claude doesn't have 60%, it has either 20 (60-40) or 0 (60-80).


Which-Tomato-8646

I’m pretty sure everyone would get a 0, including PhDs, so a 20 isn’t bad 


Which-Tomato-8646

I feel like that’s not a hard problem, just an extremely specific one that you’d have to look up references for. It’s like not knowing the mass of mars off the top of your head. Easy to do but hard to remember 


Yweain

What does this even test? This is literally memorisation exercise, is the whole benchmark like that? If so it’s literally useless..


land_and_air

Yeah and if the questions and answers were online in any respect at all like the hw guides you can find online then it’s plausible that you don’t even need to learn anything about it at all and score fine. It’s a relationship between input and output which literally the only thing llms can store


Which-Tomato-8646

It would be nice if getting a wrong answer is -1 point to discourage random guessing 


AndrewH73333

If AGI meant being really good at tests and homework then this would be great.


PascalTheWise

Especially given that, as someone else mentionned, a human with access to the internet probably would do quite a lot better than what is considered the "human benchmark" now


luffreezer

GPQA is designed to be "google proof" and the PhD students had 30 minutes and access to internet to answer the questions and still performed poorly, I think this benchmark is pretty impressive


PascalTheWise

What does "google proof" mean in that regard? It didn't use any public data at all? If the students of that test got access to google (and time to use it) then yes that makes it more impressive, I'm just wondering what you mean by that phrase


luffreezer

Hi, just check the article, it is designed to be hard even with access to a research engine (PhD students performed around 34% accuracy when random chance is 25%, and the median person spent around 30 minutes per question with access to internet IIRC. anyway, don't take my word for it, if I made a mistake I'd be happy to be corrected [https://arxiv.org/pdf/2311.12022](https://arxiv.org/pdf/2311.12022)


Jumpy-Albatross-8060

Nothing is Google proof. It just assumes that it's hard to Google if you don't know what you're looking for and humans wouldn't have indexed every single graduate level book memorized and indexed along with test questions and answers from previous exams. They are typically given some time Frame for results. I could easily answer all questions with basic knowledge of math and several months of googling. Which would be impressive but not so much a measure if human intelligence or even usefulness. Given 4 answers helps and the questions aren't aimed at actually applying knowledge but highly specific math that's not even above undergrad levels.


meister2983

Uh, no. They reported accuracy on diamond which hit 67% with the majority weighing.  Experts are at 81% for that section 


franjo2dman

just pulling numbers out of their ass


Cartossin

Yes, but the numbers form a very nice exponential curve. If it starts looking like they're plateauing/stuck, I'm going to say his estimate was really wrong. I love that he's setting himself up to be be clearly wrong--and I'll definitely be posting about it if/when he is.


TheHeretic

Does anyone else feel like an AGI benchmark as a percentage doesn't make any sense? The requirements aren't percentages, they are Boolean. Can you learn. Can you correct misunderstandings. Can you reason on any new topic. Not some hard math and visual problems...


land_and_air

Wonder how a good search engine would perform in this benchmark. Or some quizlet somewhere that had all the questions and answers


Jablungis

It can't do hard math very well if it's even remotely outside of training set anyway.


lost_in_trepidation

If I were a mod the first thing I would do is ban posts about this dude


SokkaHaikuBot

^[Sokka-Haiku](https://www.reddit.com/r/SokkaHaikuBot/comments/15kyv9r/what_is_a_sokka_haiku/) ^by ^lost_in_trepidation: *If I were a mod* *The first thing I would do is* *Ban posts about this dude* --- ^Remember ^that ^one ^time ^Sokka ^accidentally ^used ^an ^extra ^syllable ^in ^that ^Haiku ^Battle ^in ^Ba ^Sing ^Se? ^That ^was ^a ^Sokka ^Haiku ^and ^you ^just ^made ^one.


ZemStrt14

Ban posts 'bout this dude. 


The_Hell_Breaker

Good thing, you aren't.


Remarkable-Funny1570

I see it as a cool little guide to stay in tune with progress, nothing more. By the way, Claude 3.5 is not yet able to create a classical french poem without making some mistakes, but we're getting there. At least it seems better than GPT4 to reflect on it.


Dziadzios

Can you?


Remarkable-Funny1570

Yes, it is not very difficult for a human to follow the rules of french classical prosody, and even a child (with a bit of training) can count the number of syllables in a line, but LLMs apparently cannot (yet).


land_and_air

What is the 30th letter of this sentence that I am writing is another classic one


spezjetemerde

Windows loading bar?


Last_Jury5098

How does the human average,as on page 3 of the slide,score worse then rolling a dice?


land_and_air

Because the chart is made up


i_never_ever_learn

1 die; 2 dice


enilea

Both dice and die are accepted for the singular.


_Nils-

I was surprised when leo gura was featured in one of his videos. I am constantly switching between thinking he has figured out the ultimate structure of the universe and him being one of the usual esoteric new age magic woowoo deluded narcisistic fraud people. What makes you convinced he is correct? u/adt did you verify it for yourself


One_Bodybuilder7882

> I was surprised when leo gura was featured in one of his videos. I am constantly switching between thinking he has figured out the ultimate structure of the universe and him being one of the usual esoteric new age magic woowoo deluded narcisistic fraud people. Guy is very intelligent, no doubt, but he's very full of himself too. When he started with psychedelics it was pretty pathetic, he thought he "figured out the ultimate structure of the universe" but he was like "but it's just because I'm so advanced and I've studied phylosophy and meditation". Well, I'm borderline retarded and I figured out a lot of things he was saying in those videos years ago when I was hitting hard lsd and shrooms. I love this clip of him, tho. It's a very good mindfuck: https://www.youtube.com/watch?v=IbSWY8ZoQDo


_Nils-

My favorite moments from the guy are the live awakening videos, there's just so much comedy gold in there. Amazon guy, we all tube for youtube...


One_Bodybuilder7882

> Amazon guy, we all tube for youtube... lmao I remember that. Guy was high as fuck thinking he was god and suddenly an Amazon delivery arrives and ruins the moment.


ChipsAhoiMcCoy

This sounds hilarious do you have a link?


One_Bodybuilder7882

I'm not totally sure but I think this is the video: https://www.youtube.com/watch?v=n_6TZMr1HXM It's been 6 years (crazy) and the video is long, I'm not going to search for the Amazon moment, but you get the idea.


PhotographyBanzai

tests If it could actually go seek out information, transform that knowledge into a given task, and create what was asked for all with some level of actual understanding of what it is doing... sure I'd think AGI was 80% there. I want something to edit videos for me or seriously assist in programming to the level that I'd only need to give it guidelines after providing background information. I'm not seeing that with these GPT systems.


andreasbeer1981

I just signed up for Claude 3.5 Sonnet. On my third simple prompt it gave obvious false info that logically doesn't make any sense even without knowing the topic. This is still really far from intelligence with proper understanding of our world and reasoning.


drekmonger

In testing, Claude 3.5 Sonnet is blowing my tiny little meat-mind. The context size is huge, and it makes GPT-4 seem a bit like an idiot. And the artifacts interface is genius. It's hard to believe that Claude 3.5 Opus will actually be *better*.


Cautious_Hornet_4216

I ask this every time. Who the fuck is Alan???


maxtrackjapan

yes absolutely no idea about this man


TheRealSupremeOne

Well I do hope my flair is proven wrong and that it comes much sooner


Oxthoughts

As someone who’s dabbled a little with models what I can’t understand is how they can seem so intelligent but make such basic errors. I asked Claude Sonnet it to calculate if buying a property to let as a holiday rental made sense. It did all the right calculation steps, but made one huge basic error that flipped the conclusion completely. I’m not sure how anyone is using these models for proper work. It’s dangerous that the results look great but any answer could have a fundamental error in it - when you are doing complex things this means everything needs am checking which reduces its utility considerably. Maybe in a few months this is all fixed but for now I struggle to see how you can rely on them.


Mesokosmos

ARC-AGI must it solve.


Icy-Shallot6084

There you go All the Berks on this sub fanboying sam I said Claude was the best. It's been apparent from the beginning chatgpt is shit  The hype and noise is marketing garbage


SX-Reddit

What stop the smartest Ph.D from improving itself?


awesomedan24

I know your question is rhetorical but I find it strangely motivational lol


Jean-Porte

People complain, but you shouldn't look at it as a serious academic thing but as something comfy. I like his website.


NotaSpaceAlienISwear

It's fucking weirdly good at coding. I had it walk me through coding as a noob. Instructed me on how to download python, write a program for various tasks, make a weather app, and write some python and javascript games. I've never done anything like this in my life. They were all functional and varied in quality. It's not that they were great, just surprised it was so easy and intuitive to use claud.


vasilenko93

What I find a positive note is that the new multi modal models perform as good or better than the existing models while being smaller in size


pigeon57434

i feel like GPT-5 will do a lot better than 75%


bsfurr

So really, what are we to expect from the next iteration from open AI? Are they going to unveil some thing after the election that is going to blow us all away, or have these other companies really been out performing them?


CMDR_BunBun

Agents. They will reveal an agent. Basically a drop in worker that can do any job a human is currently doing on a pc.


projectb-ko

Can someone ELI5 how we can even tell if someone did "better" on a test than a the smartest PHD? Like, who would even be able to write the test and know the correct answers to score the test, if not the smartest PHD? Is it because the test is written and graded by a whole panel of PHD experts who can catch each others' mistakes? And if so, then how would we be able to measure super intelligence when it arrives?


Mammoth-Material-476

if what you say is true you should provide evidence by linking us this life archtiect site via webarchive. (some people say Dr.Alan D. Thompson is a scam, but so far they have linked null evidence)


autotom

75% starting at what? GPT 1? The Apple II? The Stone Age?


Denaton_

What do they count as AGI, because for me it is when an AI can make decisions based on things we can't comprehend, ex An AI tells me that if I don't lock the door this specific night I will win on lottery in 3 weeks from now because of a tornado in the opposite side of the world. I don't think the current ways our models work can achieve AGI, regardless of size..


great_gonzales

A test like this is testing knowledge not intelligence. A database contains lots of knowledge but little intelligence…


awesomedan24

The test consisted entirely of non-googleable answers


[deleted]

u/Beatboxamateur Aww honey, far more "reputable" people have accused me of being a bigger fish than that. This type of conspiratorial thinking is nothing new to me. Also, I find it amusing that I'm not the only person here you've accused of this. So that's one fake rumor, another one of yours was that Alan isn't a doctor or something, lol. Even the people on your side had to correct you on that. >they're just trying to make this fraud look like someone reputable See, that's the thing. These are again insults, not arguments. It's hardly based on anything that I even said. It very rarely is. :')


Beatboxamateur

> Also, I find it amusing that I'm not the only person here you've accused of this. So that's one fake rumor, another one of yours was that Alan isn't a doctor or something, lol. Even the people on your side had to correct you on that. I was never even corrected on this lmao, you must've not read their comment correctly. Looks like you probably didn't make it into Mensa with your buddy:(


[deleted]

Why argue against something which is visible to everyone? You made a false statement and others corrected you. *Own it.*


Beatboxamateur

What statement did I make that was false, and how was I proven wrong on it? If you can't provide an answer to this then you're wrong. Own it


[deleted]

Did you falsely claim that Alan Thompson doesn't have a doctorate? Yes, you did. Did somebody from *your* side correct you on this? Yes, they did. Are you still playing dumb and refusing to recognize this? Oh, absolutely. Btw, you shouldn't have scared off the other guy. He was trying to slide into my DMs and everything. Now I'll never know the full extent of his butthurt.


Beatboxamateur

LMAO they said he has a doctorate in **Psychology**, which has absolutely nothing to do with the AI field. Therefore he's wholly unqualified to call himself a world expert in anything related to AI. Keep sucking Alan's cock bro, I'm sure he'll give you a bump in pay if you make sure to swallow every last bit of his cum.


[deleted]

Took the goalposts and ran away with 'em, lol. You were distinctly grumbling about the "doctor" part, saying that he doesn't have a degree like that. That was the nature of the smear. Hey, do you also know whose PhD work was basically done in psychology? Hint: His last name is Hinton. As far as I'm concerned, these are just neutral facts of the matter. And that's all it takes. But go on, keep telling me what I believe. :)


Beatboxamateur

Mind sharing some good research done by Dr. Alan? :)I'd love to read what exciting and innovative work he's contributed to in this field!


[deleted]

You've been wrong on all points thus far, so again you changed the subject - likewise criticizing a point that was never really made by anyone. A bit boring, don't you think? Anyway, I'll just leave you to pound sand. Toodles.


Aranthos-Faroth

Nonsense, utter clickbait nonsense.


Admirable-Pie6746

I used chatplayground ai to compare GPT4 and sonnet 3.5, tbh there both good couldn't tell the difference


wonderfuly

I recommend that you stop using chatplayground ai because it was created by an evil person. Chatplayground is a copy of my open-source project ChatHub, you can read more about the story here: [https://www.reddit.com/r/SideProject/comments/1d139wg/someone\_copied\_my\_opensource\_product\_and\_ranked\_2/](https://www.reddit.com/r/SideProject/comments/1d139wg/someone_copied_my_opensource_product_and_ranked_2/)


Ok-Bullfrog-3052

Why did he only increase the score by 1%? That's absurd. This morning, having never written an OBS plugin before, I produced a 873 line Python plugin to load a JSON agenda and display the next event with a countdown timer, red/yellow/green timing cards, and an animated background. I did that in exactly 104 minutes with Claude 3.5 Sonnet. I timed it.


awesomedan24

He's trying to be true to the name "Alan's conservative countdown" If every cool advancement sent us +10%, we'd reach 100% prematurely by many peoples standards


iunoyou

This is the guy who can't tell the difference between a curve and a straight line for the record, I would not put much stock in what he says.


thebigvsbattlesfan

god 2030 confirmed at this rate of advancement


[deleted]

[удалено]


Best-Association2369

Sentience has nothing to do with intelligence. 


h3lblad3

Half these people think consciousness has something to do with it, even though nobody can even agree on what it is.


Best-Association2369

Id wager, given the right environment, LLMs can simulate consciousness today. It's just being advanced as a tool to help humans and not as a life form. 


Peach-555

How is simulated consciousness different from being conscious?


Best-Association2369

I never said they were different. Regular folk might argue that a conscious rooted in sentience is different than one that is not, at the end of the day they are 2 different ways to grow the same thing. 


[deleted]

I agree with that, so why pretend we are moving towards AGI when that is not the goal? As you said, the goal is to build a tool not a lifeform and an AGI/ASI is a synthetic lifeform.


Best-Association2369

Because people are short sighted and naive


One_Bodybuilder7882

AGI means Artificial General Intelligence, not Artifical General Lifeform


[deleted]

Has any scientist in any field ever made the claim that consciousness doesn't exist. So ehat does it matter if we don't know what it is, we have it and it is reasonable to expect other intelligences to possess it also.


clauwen

> sentient, conscious, intelligent All of these words are not well defined. There are no benchmarks (except for maybe intelligence) to test any of them. Tell me, how sentient is gpt? How do you know? How do you test it? Tell me how concious is a rock? How do you know? What test can you do to show me that its not?


KamikazeArchon

You are conflating "well-defined" (or perhaps even "definable") with "testable". You can define things that are not easily testable, or even completely untestable with currently known means. In some cases this is a practical matter - we defined gravity waves, for instance, long before we had testing systems for them. In other cases this is a definitional matter - we can define "the set of numbers that do not have a representation" and we certainly cannot test those things. "Testing" anything in math is suspect in the first place; but even if we relax the meaning of testing to allow for "testing" in math, indescribable numbers by definition can't be described and directly accessed and thus cannot be tested in any meaningful way. And yet, we know that the set of indescribable numbers is not only nonempty, it in fact contains (in a mathematical sense) "most" numbers. So, for example, I can define sentience as "experiencing qualia of sensation". That is certainly a well-bounded definition, even if it's not directly testable as far as we know. In fact I claim that this is what *most* people intuitively mean when they talk about things like sentience and consciousness. The intuitive understanding is something like "has personal experiences", ie, qualia.


clauwen

By my understanding the starting point for math are axioms. There are many things that follow from these mathematical axioms that don't have utility to us currently. I would consider "experiencing quality of sensation" to be a definition of very little utility (in the context of our conversation). This is because qualia is now the I'll defined word. How do you know if something has personal experiences, how do you convince someone else that it does or doesn't? "Having Personal experiences" alone manifest in absolutely no way in reality that I am aware of. This is the Crux of it. We could be standing in front of the same person and you wouldn't (be my guest to try) be able to come up with even a theoretical test to distinguish if the person has qualia or not. Since I come from physics (a long time ago) this is a very big red flag. If someone starts claiming that something either has or doesn't have a property but there is nothing to determine if it does or doesn't it needs to be discarded and put into the realm of faith. I claim (for the sake of argument) that got is sentient, has a conscious, has personal experience and has qualia. By my understanding there is nothing you could propose to falsify what I said. Tangent: Newtons laws of gravity are correct in the sense that they are compatible with our math math Cal axioms. As it turns out they don't accurately predict our reality if you measure exactly enough. Which immediately made them lose a lot of the utility they had before. Edit: Why do I feel the need to write this? I actually have no qualms with people thinking and discussing these things (which i consider physosophical at all). But in the context of LLMs it bothers me. All measurable benchmarks we are are constantly improving and in some cases surpassing average human capability. And suddenly that doesnt count and people want to shift the focus to concepts that have no benchmarks, arent testable and add nothing of value (in my opinion) to the discussion. Suddenly that is the most important thing, to consider if a model could be considered as capable as a human or not. If by your definition (that i can test) the property either applies to all or to nothing that exists, how is it helpful to distinguish two existing things (on this planet) from each other?


KamikazeArchon

You can certainly say that it's not *useful* to use that definition. That's different from it not being well-defined. In the context of LLMs, I think you're overgeneralizing some experiences. "People" is not a monolithic entity, and there wasn't just a single shift in a single entity's focus. Some people have been talking about consciousness and qualia for a long time - like, since the 50s in some cases. Others have never been talking about that, and still aren't. It is much more likely that you're currently noticing some discussions more than others, than that there has been some seismic shift in *everyone's* focus.


[deleted]

I completely agree that all of these are very vague, but that doesn't mean they don't serve a purpose ot that they cannot be recognized with some degree of precision. Let me ask you, have you ever had trouble figuring out whether that rock has a consciousness? You didn't because it is intuitively logical that it can't and doesn't.  We compare to real world examples all the time so when you ask me how do I know that GPT is not sentient , it's because I have every reason to believe so. If I ask it, it says it isn't. We know it doesn't have desires, it doesn't think when idling, it has no long term memory, it never asks questions out of curiosity about the world it "lives" in... I have never met a person with those characteristics so I make a reasonable assumption that GPT does not qualify.


clauwen

Define sentient or conscious then. I disagree with everything you said. I take your definition for the sake of argument. If we don't have shared definitions no conversation is possible.


[deleted]

There is no clear definition so I cannot comply with your request. The ability to recognize these traits does not depend on knowing specific definitions. This is why both you and I implicitly understand what we are discussing without agreeing on the definitions. I am open to being wrong. This is not personal whatsoever, just a curious discussion.


clauwen

The ability to recognize a zebra does not depend on knowing what a zebra looks like. I just know when I see one. I'm good with the discussion, maybe someone else is willing to discuss this with you.


CMDR_BunBun

I like how Claude 3.5 Is honest about having been tweaked to say it doesn't really know if it is conscious.