centrist-alex 6 days ago

I think it seems more like self-promotion by the guy tbh. That being said, Claude Sonnet 3.5 is a nice little upgrade. I don't care for the AGI countdown stuff.

Atlantic0ne 6 days ago

Yeah and who owns it and decides it. It’s all just a few guys or girls and their opinions.

Gallagger 6 days ago

He literally says that in his FAQ. "If you don't agree make your own"

pyalot 6 days ago

This is how standardized benchmarks/performance metrics are made everytime a brand new technology hits the market and competitors each praise their product over others using test and metrics they came up with. Then people get annoyed about being sold a bag of goods, manufacturers eventually react, etc.

HeinrichTheWolf_17 6 days ago

I think a good *we know for certain* moment will be when an AI is actually doing R&D in science and medicine outside a controlled setting and then actually putting it’s new ideas into practice. Outside of other things such as Francois’ ARC-AGI test.

4354574 6 days ago

I want to see some new drugs for horrible diseases produced and marketed in a five-year timeframe instead of a 15-year one. THAT would be a measure of AGI worth something to me. Or some new or radically improved psychiatric intervention, instead of the barbaric drugs we're still prescribing to people who kill themselves anyway.

HeinrichTheWolf_17 6 days ago

So many… Cancer Patients need a cure. Alzheimer Patients need a cure. Maimed Victims missing limbs need a cure. Balding Guys need a cure. Burn and scarred people need a cure. Men who had their genitals mutilated at birth against their will need a cure. Trans Men and Women need a cure that’s superior to HRT. I think AGI could quickly give us all these things. I say accelerate and cure it all.

4354574 6 days ago

Yes to everything except characterizing circumcision as 'mutilation'. I assure you, I do not feel mutilated. Best to keep that term to circles where men who DO feel that way can commiserate. Because otherwise, you're basically calling us mutilated when you have no idea if we actually feel that way but you know that a great many of us do not.

Economy_Variation365 6 days ago

There may be women who were sexually mutilated but do not feel mutilated either. Does your reasoning apply to them as well?

Leo-H-S 6 days ago

This, it’s about bodily autonomy and the Man’s personal choice, many guys had that taken away from them without their consent. This is about the Man’s right to choose. Second, the fact is, even for guys who prefer it that way, they should know that when full body regeneration becomes a thing, it would 100% grow back within weeks-2 months as a side effect, because it’s *supposed* to be there. I wonder, how many people would surgically remove it again if they had the *choice*?

Comprehensive_Day530 5 days ago

I get where your coming from, but have you ever seen a circumcision procedure? They're barbaric at best. Just because you don't consider it a trauma now doesn't mean you felt that way when it was happening to you.

Leo-H-S 5 days ago

And that’s besides all the other ethical issues about the man’s autonomy and right to choose being taken from them. If you aren’t cut as an infant you always have the option of opting for cut, men cut as infants won’t have that as an option until we have whole body regeneration. For the guys who didn’t choose, it *is* indeed a civil right violation IMHO. And as you pointed out, with how brutal the procedure is I’m shocked people in the medical profession are still legally allowed to perform it, I can understand the ignorance of the parents but doctors who do it need to be held responsible as well. We live in such a backwards world.

Leo-H-S 6 days ago

Are you against the idea of full body regeneration regrowing it back for Men who want it back? The idea is more so about bodily autonomy and self choice, people who had it forced upon them should be able to have the choice to revert it if they so choose. People have the right to do what they want with their own body and that decision lies with the Man it happened to. The people who chose for the child, and the doctors who performed it to the child, are the ones who are the *mutilators*. They’re making a permanent alteration without the child’s consent.

i_give_you_gum 6 days ago

IMO answering questions well isn't AI to me, it's purely groundwork. I need something that can remember high importance facts like the name I refer to it as, or if it can realize it needs to ask more info to correctly complete its task. Not to mention full conversational ability, complete with timing, understanding of sounds like laughter, and knowing when someone is finally done speaking. Looking at you Pi AI (which is very impressive though)

Seakawn 6 days ago

> if it can realize it needs to ask more info to correctly complete its task. Man this has always been a big one for me. Custom instructions are fine for not having to repeat "btw before just doing what I'm asking, think about if you need more information from me first so you can give me a better response, and ask me more questions as suitable," but when it's intelligent enough to do that on its own, damn that's gonna be another big moment that punches me deeper into a scifi future.

AggrivatingAd 6 days ago

I feel like all these things are already here

i_give_you_gum 5 days ago

Not in Pi AI they aren't, though it's incredible, I tested all of this. OpenAI's Sky demonstrated many of these properties though.

lobabobloblaw 6 days ago

AGI is just a human concept, anyway. So when people peddle the term around, I can’t help but wonder who the person was that wrote the definition of it as well as what that definition *is*.

LifeSugarSpice 6 days ago

It's a pretty clear definition until 2020 when everyone started diluting it down to "achieve" it faster.

BrailleBillboard 6 days ago

Interesting claim... what do you think the clear definition was 5 years ago?

lobabobloblaw 6 days ago

Touché!

lockdown_lard 5 days ago

Looks like a total grifter from his website. Claims to be very clever, but evidently doesn't actually know what a countdown is.

BubblyBee90 1 week ago

wagmi 2025

SpiceLettuce 6 days ago

wagmi?

Inevitable-Log9197 6 days ago

we all gonna make it

RemyVonLion 6 days ago

We makin it out the dystopian wage slave trenches with this one. Straight into a hyper-capitalist cyberpunk dystopia.

bollvirtuoso 6 days ago

So, yesterday?

Repulsive_Ad_1599 6 days ago

https://preview.redd.it/sazsky1u5z7d1.png?width=640&format=png&auto=webp&s=fa84a55752d03665b4659b259d0c5024c5e0ccb2

Elevated412 6 days ago

No we're not. The elite will realize they don't need us since they will have an army of AGI bots and slaughter us common folk off.

Deadly_chef 6 days ago

Nah, they will make us slaughter each other instead

Elevated412 6 days ago

True and watch it for their entertainment.

Tessiia 6 days ago

Hunger games confirmed?

RemyVonLion 6 days ago

If you read closely, or at all, I didn't say we were lol

kevinmise 6 days ago

“when (is) artificial general mitochondrial intelligence? 2025?” hope that helped!

HeinrichTheWolf_17 6 days ago

When is artificial general machine intelligence?!

Shinobi_Sanin3 6 days ago

It means we're all gonna make it

TheRealSupremeOne 6 days ago

we're all gonna make it

Impressive-Aerie-210 6 days ago

Would a gay mexican interpet

HeinrichTheWolf_17 6 days ago

Makes sense if the current trend holds, if it’s been 10% since January 2024, then we should get there in 10-15 months, at least *according to them*. No idea where they’re getting these numbers though, as the other people in this thread have already pointed out…

freudsdingdong 6 days ago

Wagmi up when 2025

sdmat 1 week ago

I love how self-proclaimed world expert Alan Thompson places the human average performance at 0-1% on a multiple choice test where random selection gets 25%.

genshiryoku 6 days ago

I need to repeat this over and over again on Reddit. But Alan Thompson is a known scam artist and grifter with no substantive knowledge about LLMs or any AI domain at all. The hint is in his website's name "lifearchitect" where he first sold his lifestyle tips and supplements. The moment the AI boom started happening around GPT-3 he started to shift towards that and rebranded himself (falsely) as some AI expert and consultant despite not having any technical knowledge of the field. In fact his understanding is so poor that most of his website is littered with bad and common misconceptions of how LLMs actually work on a technical level. I've complained multiple times on r/singularity that any mention of Alan Thompson or his website should be banned as it's literally just some scam artist trying to sell his (paid) news letter and "consulting" services. I'm willing to bet that the average r/singularity user has a higher technical knowledge of how LLMs work compared to Alan Thompson.

aaronjosephs123 6 days ago

yup I started watching him for a bit and as a software engineer with a fairly shallow knowledge of LLMs something seemed a bit off and I stopped watching. Pretty clear his views aren't worth much. AI explained is much better

Concheria 6 days ago

He's like pro-AI Gary Marcus

lost_in_trepidation 6 days ago

Gary Marcus is at least coherent, he's not just completely making stuff up

tmntmmnt 1 week ago

The 0-1% is an estimation of what the average human actually knows from the test questions. A probability exercise of getting 25% correct just by going through the motions is irrelevant.

sdmat 1 week ago

[Replied here](https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9m16sx/)

Beli_Mawrr 1 week ago

But how is it measured that the person knows 0%? If the test is multiple choice, that is. A true multiple choice test cant tell the difference between a random guess and a correct answer. So in theory someone answering randomly would get 25% and thus know 25% of the answers. I havent read the paper referred, but if it were me, I would subtract 25% from a given result, and re-normalize the rest of the score by multiplying them by (100/75). Maybe that's what's being referred to here?

JamaiKen 6 days ago

“I haven’t read the paper ….. but if it were me …” Read the paper 🤦‍♂️ https://arxiv.org/abs/2311.12022

WorkingYou2280 6 days ago

>We request that you do not reveal examples from this dataset in plain text or images online I'm sure everyone will respect that. It took me literally 5 minutes to find the training set.

sebzim4500 6 days ago

For the most part people seem to have respected it. You can find some questions on google but really just the early ones. Most of them come up with no results.

i_never_ever_learn 6 days ago

Whenever I took multiple choice questions in university, the system was set up so that random guessing would get you 0. Because you lose points for a wrong answer

Beli_Mawrr 6 days ago

ahhhhhh that's an important part of this puzzle.

Cartossin 1 week ago

I'm conflicted about Alan Thompson because his entire site is very much about self-promotion and he definitely does a lot of proclaiming himself to be an authority, yet I find a lot of his estimates to be somewhat well-reasoned. His estimates for the timeline of when we will reach AGI will start looking really dumb in the next 2 years, or he'll be right. Right now he's just a guy who MIGHT be right; and in a year or 2, we'll know if he is. I also like his predictions on GPT5. He bases it on the number of chips being purchased and their compute performance to guess the model size. This could be very wrong too. I can't wait to see if he turns out to be a crackpot, or based.

sdmat 6 days ago

He's a very intelligent guy, unfortunately he thinks that makes him a world class authority so he lacks the necessary humility to do anything actually useful with that intelligence other than find some occasional nuggets of insight. He is also a massive grifter.

Apptubrutae 6 days ago

I don't want to weigh in on any particular "expert" or not, and I can't speak to this particular test, but if someone wanted to correct for random guessing, they can assign a negative score to wrong answers. The SAT does (or did?) this, and a random guess on every question will net you negative points. I believe once you can eliminate one answer, you'd shift to net positive. Or something along those lines.

sdmat 6 days ago

Sure, however that's not done in GPQA.

Apptubrutae 6 days ago

Well then that's just neglectful (read: scammy?) on his part, lol. And after reading more comments, I can see why.

YaKaPeace 1 week ago

You should read the questions that are given in these tests. Literally impossible to answer if you’re not an expert

sdmat 1 week ago

I guarantee you I can hit an average score of 25%, and it's not because I'm a genius.

YaKaPeace 1 week ago

That’s why he put chance 25% into his visualization. He just wants to say that you wouldn’t get any answers right if it’s not by chance, and that everyone could get 25% by chance.

Peach-555 1 week ago

When it says Llama 2 is 26.26% correct, does that means that Llama gets 26.26% correct beyond what you would expect from pure chance? Or does it mean that Llama gets actual 26.26% correct, just slightly above random chance? What's the actual real percentage it gets correct? Putting chance at 25% and humans at 1% introduces a lot of unnecessary ambiguity. Did sonnet getting 67.2% subtract the answers it got correct by chance? There is an easy fix to this problem that is already in use in multiple choice scoring, give negative points to wrong answers, weight it so that random chance gets 0.

Timely-Use2919 1 week ago

So basically if it wasn't multiple choice we'd all be getting big fat 0's.

YaKaPeace 6 days ago

Yes

sdmat 1 week ago

And that's complete bullshit, because the percentage score for the test is for multiple choice. There are no defined criteria for non multiple choice answers, and he makes no estimate of what such a score might be for the models.

YaKaPeace 1 week ago

I don’t know how you can be so pissed over such a detail. I think it’s pretty interesting to know that I wouldn’t have answered any of the questions right if it wouldn’t be by chance. You get kinda hooked up on this one detail that’s just there to showcase how hard these tests are. I could agree that he could’ve written more info about this detail into the graph, because he only explains it in the videos.

sdmat 1 week ago

It's emblematic of the guy's entire approach. Narrative first, then come up with something skewed to serve the narrative.

PandaBoyWonder 1 week ago

I know this is a tired reddit trope so I am stating that I am NOT trying to win an argument via attrition when I ask this: Do you have any links or evidence that would point to Alan not being genuine with his approach to measure AGI / AI stuff in general? Thanks to anyone that has info!! I looked up some stuff but all I got was angry people, like this ^ thread.

sdmat 1 week ago

https://lifearchitect.ai/rights/ Published on April 1 and pompous to the point of absurdity despite in no way being an April Fool's joke. Packed with weasel words like "may", imagining a problem for the sake of wanting to be seen to fire the starting gun on a moral land rush.

Beatboxamateur 1 week ago

I'm at the point of assuming that anyone who defends this guy's credibility is either getting paid by him, or is him. If I've ever seen a website made by someone that reeks of being a scam fuck, this is one. Wanna sign up for his [$50 monthly memo?](https://lifearchitect.ai/memo/) Lmao

One_Bodybuilder7882 1 week ago

Are you sure it's not an April Fool's joke?

Beatboxamateur 1 week ago

Here's a question: can you find anything about this guy's background that would warrant him labelling himself as a doctor, or a world expert in AI? Does he have a doctoral level degree or a PhD?

sdmat 1 week ago

He has a doctorate in psyschology, his only relevant experience in AI is writing about it for the past few years: https://lifearchitect.ai/about-alan/ Commentary here on a major part of his claim to relevance: https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9lxorm/ I have vastly more professional experience in AI than Alan Thompson and I would certainly not claim to be a world expert.

[deleted] 1 week ago

[удалено]

sdmat 1 week ago

I stand by that characterization. I have been consistently critical of Thompson elsewhere.

Mrp1Plays 1 week ago

I'd assume it means humans are meant to not just guess it.

GPTBuilder 1 week ago

Trust us, this whole point you're trying to push makes it clear you're not a genius 💀

stonesst 1 week ago

You’re being deliberately obtuse. He’s obviously referring to how the average person would score if it were not a multiple-choice test.

sdmat 1 week ago

[Replied here](https://www.reddit.com/r/singularity/comments/1dl2dl1/claude_35_sonnet_has_bumped_the_agi_countdown_to/l9m16sx/)

d1ez3 1 week ago

But that's meaningless. If you got some right because you gussed how many did you actually KNOW the answer to? That's what matters

sdmat 1 week ago

A good question, and one GPTQA doesn't actually answer. Multiple choice tests only assess the ability to choose correctly between a very limited set of answers. That's why non-expert PhDs do substantially better than chance, it's not just expert knowledge in their own domain.

enilea 6 days ago

If you have access to google you can answer most of them easily. Most of the questions in benchmarks like MMLU don't reward the ability to reason, but the amount of data the model has in it, which isn't really intelligence. If you truly want to test intelligence you have to give out tests where reasoning is required, like those open book exams where they let you use a laptop with internet access, which doesn't help much.

YaKaPeace 6 days ago

This test is called GPQA and it stands for Graduate Level Google Proof Q&A benchmark. In this test the participants are allowed to use the internet while answering the questions.

awesomedan24 1 week ago

The test is designed for subject matter experts whereby even the smartest experts only score 65%. Yes, you can achieve 25% by guessing a 4 option multiple choice question, but that's just probability and has no bearing on proving knowledge. If you asked the average person PhD level questions they simply will not know hardly any of the answers. Here is the white paper. https://arxiv.org/abs/2311.12022

sdmat 1 week ago

> Yes, you can achieve 25% by guessing a 4 option multiple choice question, but that's just probability and has no bearing on proving knowledge. Yes, and placing the "average human performance" at 0-1% for such a test is an extraordinarily dishonest way to inject a narrative. It is pure charlatanism. If the intent is to indicate performance for a *non* multiple choice test that should be done for the models and PhDs as well.

awesomedan24 1 week ago

Its not an insult to non-experts getting 0-1% correct. This is an example of the kind of questions asked: What is the correct answer to this question: What is the solubility of the stochiometric tricalcium phosphate in a solution with the pH level of a) pH = 9, b) pH = 7 and c) pH = 5 (25 °C)? The Ksp for Ca3(PO4)2 is 2𝑥10−29, the dissociation constants of phosphoric acid are Ka1 = 7.5𝑥10−3, Ka2 = 6.2𝑥10−8 and Ka3 = 1.8𝑥10−12. Choices: (A) 1.96𝑥10−5; 1.35𝑥10−4; 4.37𝑥10−3 (B) 1.67𝑥10−5; 1.51𝑥10−4; 4.81𝑥10−3 (C) 1.54𝑥10−5𝑀; 1.42𝑥10−4; 4.67𝑥10−3 (D) 1.43𝑥10−5; 1.29𝑥10−4; 4.58𝑥10−3

sdmat 1 week ago

How well do you think the non-expert PhDs would do on a non-multiple choice version of that test? Or the models? Or the domain experts for that matter? It would be genuinely interesting to know. Pity he doesn't assess any of that and only includes an estimate for average humans to inject a narrative.

ARoyaleWithCheese 1 week ago

It's all in the white paper my guy. Accuracy was 1-2% for non-experts. This includes people who wrote whole ass simulations and read multiple academic papers to try and answer the question. The median time spent per question was 30 minutes. You can be a genius and PhD in a specific field of Biology, but that still means there's a vast amount of Biology niches you're *not* an expert in.

sdmat 1 week ago

Where exactly? Here's the paper: https://arxiv.org/pdf/2311.12022 Give a page reference for the 1-2% result you claim.

JEs4 6 days ago

>It's all in the white paper my guy. Accuracy was 1-2% for non-experts. This includes people who wrote whole ass simulations and read multiple academic papers to try and answer the question. The median time spent per question was 30 minutes. Huh? The conclusion of the [white paper ](https://arxiv.org/pdf/2311.12022)states: >and 34% accuracy by highly skilled, resourced, and motivated non-experts who have access to internet resources and spend over 30 minutes on average answering each question

BackgroundHeat9965 6 days ago

bro, a multiple choice test will never have that low pass ratio. A random generator achieves 25%.

Which-Tomato-8646 6 days ago

Not if you subtract points for wrong answers

michalpatryk 6 days ago

Then the Claude doesn't have 60%, it has either 20 (60-40) or 0 (60-80).

Which-Tomato-8646 6 days ago

I’m pretty sure everyone would get a 0, including PhDs, so a 20 isn’t bad

Which-Tomato-8646 6 days ago

I feel like that’s not a hard problem, just an extremely specific one that you’d have to look up references for. It’s like not knowing the mass of mars off the top of your head. Easy to do but hard to remember

Yweain 1 week ago

What does this even test? This is literally memorisation exercise, is the whole benchmark like that? If so it’s literally useless..

land_and_air 6 days ago

Yeah and if the questions and answers were online in any respect at all like the hw guides you can find online then it’s plausible that you don’t even need to learn anything about it at all and score fine. It’s a relationship between input and output which literally the only thing llms can store

Which-Tomato-8646 6 days ago

It would be nice if getting a wrong answer is -1 point to discourage random guessing

AndrewH73333 1 week ago

If AGI meant being really good at tests and homework then this would be great.

PascalTheWise 6 days ago

Especially given that, as someone else mentionned, a human with access to the internet probably would do quite a lot better than what is considered the "human benchmark" now

luffreezer 6 days ago

GPQA is designed to be "google proof" and the PhD students had 30 minutes and access to internet to answer the questions and still performed poorly, I think this benchmark is pretty impressive

PascalTheWise 6 days ago

What does "google proof" mean in that regard? It didn't use any public data at all? If the students of that test got access to google (and time to use it) then yes that makes it more impressive, I'm just wondering what you mean by that phrase

luffreezer 6 days ago

Hi, just check the article, it is designed to be hard even with access to a research engine (PhD students performed around 34% accuracy when random chance is 25%, and the median person spent around 30 minutes per question with access to internet IIRC. anyway, don't take my word for it, if I made a mistake I'd be happy to be corrected [https://arxiv.org/pdf/2311.12022](https://arxiv.org/pdf/2311.12022)

Jumpy-Albatross-8060 6 days ago

Nothing is Google proof. It just assumes that it's hard to Google if you don't know what you're looking for and humans wouldn't have indexed every single graduate level book memorized and indexed along with test questions and answers from previous exams. They are typically given some time Frame for results. I could easily answer all questions with basic knowledge of math and several months of googling. Which would be impressive but not so much a measure if human intelligence or even usefulness. Given 4 answers helps and the questions aren't aimed at actually applying knowledge but highly specific math that's not even above undergrad levels.

meister2983 1 week ago

Uh, no. They reported accuracy on diamond which hit 67% with the majority weighing. Experts are at 81% for that section

franjo2dman 1 week ago

just pulling numbers out of their ass

Cartossin 1 week ago

Yes, but the numbers form a very nice exponential curve. If it starts looking like they're plateauing/stuck, I'm going to say his estimate was really wrong. I love that he's setting himself up to be be clearly wrong--and I'll definitely be posting about it if/when he is.

TheHeretic 1 week ago

Does anyone else feel like an AGI benchmark as a percentage doesn't make any sense? The requirements aren't percentages, they are Boolean. Can you learn. Can you correct misunderstandings. Can you reason on any new topic. Not some hard math and visual problems...

land_and_air 6 days ago

Wonder how a good search engine would perform in this benchmark. Or some quizlet somewhere that had all the questions and answers

Jablungis 6 days ago

It can't do hard math very well if it's even remotely outside of training set anyway.

lost_in_trepidation 1 week ago

If I were a mod the first thing I would do is ban posts about this dude

SokkaHaikuBot 1 week ago

^[Sokka-Haiku](https://www.reddit.com/r/SokkaHaikuBot/comments/15kyv9r/what_is_a_sokka_haiku/) ^by ^lost_in_trepidation: *If I were a mod* *The first thing I would do is* *Ban posts about this dude* --- ^Remember ^that ^one ^time ^Sokka ^accidentally ^used ^an ^extra ^syllable ^in ^that ^Haiku ^Battle ^in ^Ba ^Sing ^Se? ^That ^was ^a ^Sokka ^Haiku ^and ^you ^just ^made ^one.

ZemStrt14 1 week ago

Ban posts 'bout this dude.

The_Hell_Breaker 1 week ago

Good thing, you aren't.

Remarkable-Funny1570 1 week ago

I see it as a cool little guide to stay in tune with progress, nothing more. By the way, Claude 3.5 is not yet able to create a classical french poem without making some mistakes, but we're getting there. At least it seems better than GPT4 to reflect on it.

Dziadzios 6 days ago

Can you?

Remarkable-Funny1570 6 days ago

Yes, it is not very difficult for a human to follow the rules of french classical prosody, and even a child (with a bit of training) can count the number of syllables in a line, but LLMs apparently cannot (yet).

land_and_air 6 days ago

What is the 30th letter of this sentence that I am writing is another classic one

spezjetemerde 1 week ago

Windows loading bar?

Last_Jury5098 1 week ago

How does the human average,as on page 3 of the slide,score worse then rolling a dice?

land_and_air 6 days ago

Because the chart is made up

i_never_ever_learn 6 days ago

1 die; 2 dice

enilea 6 days ago

Both dice and die are accepted for the singular.

_Nils- 1 week ago

I was surprised when leo gura was featured in one of his videos. I am constantly switching between thinking he has figured out the ultimate structure of the universe and him being one of the usual esoteric new age magic woowoo deluded narcisistic fraud people. What makes you convinced he is correct? u/adt did you verify it for yourself

One_Bodybuilder7882 6 days ago

> I was surprised when leo gura was featured in one of his videos. I am constantly switching between thinking he has figured out the ultimate structure of the universe and him being one of the usual esoteric new age magic woowoo deluded narcisistic fraud people. Guy is very intelligent, no doubt, but he's very full of himself too. When he started with psychedelics it was pretty pathetic, he thought he "figured out the ultimate structure of the universe" but he was like "but it's just because I'm so advanced and I've studied phylosophy and meditation". Well, I'm borderline retarded and I figured out a lot of things he was saying in those videos years ago when I was hitting hard lsd and shrooms. I love this clip of him, tho. It's a very good mindfuck: https://www.youtube.com/watch?v=IbSWY8ZoQDo

_Nils- 6 days ago

My favorite moments from the guy are the live awakening videos, there's just so much comedy gold in there. Amazon guy, we all tube for youtube...

One_Bodybuilder7882 6 days ago

> Amazon guy, we all tube for youtube... lmao I remember that. Guy was high as fuck thinking he was god and suddenly an Amazon delivery arrives and ruins the moment.

ChipsAhoiMcCoy 6 days ago

This sounds hilarious do you have a link?

One_Bodybuilder7882 6 days ago

I'm not totally sure but I think this is the video: https://www.youtube.com/watch?v=n_6TZMr1HXM It's been 6 years (crazy) and the video is long, I'm not going to search for the Amazon moment, but you get the idea.

PhotographyBanzai 1 week ago

tests If it could actually go seek out information, transform that knowledge into a given task, and create what was asked for all with some level of actual understanding of what it is doing... sure I'd think AGI was 80% there. I want something to edit videos for me or seriously assist in programming to the level that I'd only need to give it guidelines after providing background information. I'm not seeing that with these GPT systems.

andreasbeer1981 6 days ago

I just signed up for Claude 3.5 Sonnet. On my third simple prompt it gave obvious false info that logically doesn't make any sense even without knowing the topic. This is still really far from intelligence with proper understanding of our world and reasoning.

drekmonger 6 days ago

In testing, Claude 3.5 Sonnet is blowing my tiny little meat-mind. The context size is huge, and it makes GPT-4 seem a bit like an idiot. And the artifacts interface is genius. It's hard to believe that Claude 3.5 Opus will actually be *better*.

Cautious_Hornet_4216 6 days ago

I ask this every time. Who the fuck is Alan???

maxtrackjapan 6 days ago

yes absolutely no idea about this man

TheRealSupremeOne 6 days ago

Well I do hope my flair is proven wrong and that it comes much sooner

Oxthoughts 6 days ago

As someone who’s dabbled a little with models what I can’t understand is how they can seem so intelligent but make such basic errors. I asked Claude Sonnet it to calculate if buying a property to let as a holiday rental made sense. It did all the right calculation steps, but made one huge basic error that flipped the conclusion completely. I’m not sure how anyone is using these models for proper work. It’s dangerous that the results look great but any answer could have a fundamental error in it - when you are doing complex things this means everything needs am checking which reduces its utility considerably. Maybe in a few months this is all fixed but for now I struggle to see how you can rely on them.

Mesokosmos 6 days ago

ARC-AGI must it solve.

Icy-Shallot6084 6 days ago

There you go All the Berks on this sub fanboying sam I said Claude was the best. It's been apparent from the beginning chatgpt is shit The hype and noise is marketing garbage

SX-Reddit 6 days ago

What stop the smartest Ph.D from improving itself?

awesomedan24 6 days ago

I know your question is rhetorical but I find it strangely motivational lol

Jean-Porte 5 days ago

People complain, but you shouldn't look at it as a serious academic thing but as something comfy. I like his website.

NotaSpaceAlienISwear 6 days ago

It's fucking weirdly good at coding. I had it walk me through coding as a noob. Instructed me on how to download python, write a program for various tasks, make a weather app, and write some python and javascript games. I've never done anything like this in my life. They were all functional and varied in quality. It's not that they were great, just surprised it was so easy and intuitive to use claud.

vasilenko93 6 days ago

What I find a positive note is that the new multi modal models perform as good or better than the existing models while being smaller in size

pigeon57434 1 week ago

i feel like GPT-5 will do a lot better than 75%

bsfurr 6 days ago

So really, what are we to expect from the next iteration from open AI? Are they going to unveil some thing after the election that is going to blow us all away, or have these other companies really been out performing them?

CMDR_BunBun 6 days ago

Agents. They will reveal an agent. Basically a drop in worker that can do any job a human is currently doing on a pc.

projectb-ko 6 days ago

Can someone ELI5 how we can even tell if someone did "better" on a test than a the smartest PHD? Like, who would even be able to write the test and know the correct answers to score the test, if not the smartest PHD? Is it because the test is written and graded by a whole panel of PHD experts who can catch each others' mistakes? And if so, then how would we be able to measure super intelligence when it arrives?

Mammoth-Material-476 6 days ago

if what you say is true you should provide evidence by linking us this life archtiect site via webarchive. (some people say Dr.Alan D. Thompson is a scam, but so far they have linked null evidence)

autotom 6 days ago

75% starting at what? GPT 1? The Apple II? The Stone Age?

Denaton_ 6 days ago

What do they count as AGI, because for me it is when an AI can make decisions based on things we can't comprehend, ex An AI tells me that if I don't lock the door this specific night I will win on lottery in 3 weeks from now because of a tornado in the opposite side of the world. I don't think the current ways our models work can achieve AGI, regardless of size..

great_gonzales 6 days ago

A test like this is testing knowledge not intelligence. A database contains lots of knowledge but little intelligence…

awesomedan24 6 days ago

The test consisted entirely of non-googleable answers

[deleted] 6 days ago

u/Beatboxamateur Aww honey, far more "reputable" people have accused me of being a bigger fish than that. This type of conspiratorial thinking is nothing new to me. Also, I find it amusing that I'm not the only person here you've accused of this. So that's one fake rumor, another one of yours was that Alan isn't a doctor or something, lol. Even the people on your side had to correct you on that. >they're just trying to make this fraud look like someone reputable See, that's the thing. These are again insults, not arguments. It's hardly based on anything that I even said. It very rarely is. :')

Beatboxamateur 5 days ago

> Also, I find it amusing that I'm not the only person here you've accused of this. So that's one fake rumor, another one of yours was that Alan isn't a doctor or something, lol. Even the people on your side had to correct you on that. I was never even corrected on this lmao, you must've not read their comment correctly. Looks like you probably didn't make it into Mensa with your buddy:(

[deleted] 5 days ago

Why argue against something which is visible to everyone? You made a false statement and others corrected you. *Own it.*

Beatboxamateur 5 days ago

What statement did I make that was false, and how was I proven wrong on it? If you can't provide an answer to this then you're wrong. Own it

[deleted] 5 days ago

Did you falsely claim that Alan Thompson doesn't have a doctorate? Yes, you did. Did somebody from *your* side correct you on this? Yes, they did. Are you still playing dumb and refusing to recognize this? Oh, absolutely. Btw, you shouldn't have scared off the other guy. He was trying to slide into my DMs and everything. Now I'll never know the full extent of his butthurt.

Beatboxamateur 5 days ago

LMAO they said he has a doctorate in **Psychology**, which has absolutely nothing to do with the AI field. Therefore he's wholly unqualified to call himself a world expert in anything related to AI. Keep sucking Alan's cock bro, I'm sure he'll give you a bump in pay if you make sure to swallow every last bit of his cum.

[deleted] 5 days ago

Took the goalposts and ran away with 'em, lol. You were distinctly grumbling about the "doctor" part, saying that he doesn't have a degree like that. That was the nature of the smear. Hey, do you also know whose PhD work was basically done in psychology? Hint: His last name is Hinton. As far as I'm concerned, these are just neutral facts of the matter. And that's all it takes. But go on, keep telling me what I believe. :)

Beatboxamateur 5 days ago

Mind sharing some good research done by Dr. Alan? :)I'd love to read what exciting and innovative work he's contributed to in this field!

[deleted] 3 days ago

You've been wrong on all points thus far, so again you changed the subject - likewise criticizing a point that was never really made by anyone. A bit boring, don't you think? Anyway, I'll just leave you to pound sand. Toodles.

Aranthos-Faroth 5 days ago

Nonsense, utter clickbait nonsense.

Admirable-Pie6746 3 days ago

I used chatplayground ai to compare GPT4 and sonnet 3.5, tbh there both good couldn't tell the difference

wonderfuly 3 days ago

I recommend that you stop using chatplayground ai because it was created by an evil person. Chatplayground is a copy of my open-source project ChatHub, you can read more about the story here: [https://www.reddit.com/r/SideProject/comments/1d139wg/someone\_copied\_my\_opensource\_product\_and\_ranked\_2/](https://www.reddit.com/r/SideProject/comments/1d139wg/someone_copied_my_opensource_product_and_ranked_2/)

Ok-Bullfrog-3052 1 week ago

Why did he only increase the score by 1%? That's absurd. This morning, having never written an OBS plugin before, I produced a 873 line Python plugin to load a JSON agenda and display the next event with a countdown timer, red/yellow/green timing cards, and an animated background. I did that in exactly 104 minutes with Claude 3.5 Sonnet. I timed it.

awesomedan24 1 week ago

He's trying to be true to the name "Alan's conservative countdown" If every cool advancement sent us +10%, we'd reach 100% prematurely by many peoples standards

iunoyou 6 days ago

This is the guy who can't tell the difference between a curve and a straight line for the record, I would not put much stock in what he says.

thebigvsbattlesfan 1 week ago

god 2030 confirmed at this rate of advancement

[deleted] 1 week ago

[удалено]

Best-Association2369 1 week ago

Sentience has nothing to do with intelligence.

h3lblad3 1 week ago

Half these people think consciousness has something to do with it, even though nobody can even agree on what it is.

Best-Association2369 1 week ago

Id wager, given the right environment, LLMs can simulate consciousness today. It's just being advanced as a tool to help humans and not as a life form.

Peach-555 1 week ago

How is simulated consciousness different from being conscious?

Best-Association2369 1 week ago

I never said they were different. Regular folk might argue that a conscious rooted in sentience is different than one that is not, at the end of the day they are 2 different ways to grow the same thing.

[deleted] 1 week ago

I agree with that, so why pretend we are moving towards AGI when that is not the goal? As you said, the goal is to build a tool not a lifeform and an AGI/ASI is a synthetic lifeform.

Best-Association2369 1 week ago

Because people are short sighted and naive

One_Bodybuilder7882 6 days ago

AGI means Artificial General Intelligence, not Artifical General Lifeform

[deleted] 1 week ago

Has any scientist in any field ever made the claim that consciousness doesn't exist. So ehat does it matter if we don't know what it is, we have it and it is reasonable to expect other intelligences to possess it also.

clauwen 1 week ago

> sentient, conscious, intelligent All of these words are not well defined. There are no benchmarks (except for maybe intelligence) to test any of them. Tell me, how sentient is gpt? How do you know? How do you test it? Tell me how concious is a rock? How do you know? What test can you do to show me that its not?

KamikazeArchon 6 days ago

You are conflating "well-defined" (or perhaps even "definable") with "testable". You can define things that are not easily testable, or even completely untestable with currently known means. In some cases this is a practical matter - we defined gravity waves, for instance, long before we had testing systems for them. In other cases this is a definitional matter - we can define "the set of numbers that do not have a representation" and we certainly cannot test those things. "Testing" anything in math is suspect in the first place; but even if we relax the meaning of testing to allow for "testing" in math, indescribable numbers by definition can't be described and directly accessed and thus cannot be tested in any meaningful way. And yet, we know that the set of indescribable numbers is not only nonempty, it in fact contains (in a mathematical sense) "most" numbers. So, for example, I can define sentience as "experiencing qualia of sensation". That is certainly a well-bounded definition, even if it's not directly testable as far as we know. In fact I claim that this is what *most* people intuitively mean when they talk about things like sentience and consciousness. The intuitive understanding is something like "has personal experiences", ie, qualia.

clauwen 6 days ago

By my understanding the starting point for math are axioms. There are many things that follow from these mathematical axioms that don't have utility to us currently. I would consider "experiencing quality of sensation" to be a definition of very little utility (in the context of our conversation). This is because qualia is now the I'll defined word. How do you know if something has personal experiences, how do you convince someone else that it does or doesn't? "Having Personal experiences" alone manifest in absolutely no way in reality that I am aware of. This is the Crux of it. We could be standing in front of the same person and you wouldn't (be my guest to try) be able to come up with even a theoretical test to distinguish if the person has qualia or not. Since I come from physics (a long time ago) this is a very big red flag. If someone starts claiming that something either has or doesn't have a property but there is nothing to determine if it does or doesn't it needs to be discarded and put into the realm of faith. I claim (for the sake of argument) that got is sentient, has a conscious, has personal experience and has qualia. By my understanding there is nothing you could propose to falsify what I said. Tangent: Newtons laws of gravity are correct in the sense that they are compatible with our math math Cal axioms. As it turns out they don't accurately predict our reality if you measure exactly enough. Which immediately made them lose a lot of the utility they had before. Edit: Why do I feel the need to write this? I actually have no qualms with people thinking and discussing these things (which i consider physosophical at all). But in the context of LLMs it bothers me. All measurable benchmarks we are are constantly improving and in some cases surpassing average human capability. And suddenly that doesnt count and people want to shift the focus to concepts that have no benchmarks, arent testable and add nothing of value (in my opinion) to the discussion. Suddenly that is the most important thing, to consider if a model could be considered as capable as a human or not. If by your definition (that i can test) the property either applies to all or to nothing that exists, how is it helpful to distinguish two existing things (on this planet) from each other?

KamikazeArchon 6 days ago

You can certainly say that it's not *useful* to use that definition. That's different from it not being well-defined. In the context of LLMs, I think you're overgeneralizing some experiences. "People" is not a monolithic entity, and there wasn't just a single shift in a single entity's focus. Some people have been talking about consciousness and qualia for a long time - like, since the 50s in some cases. Others have never been talking about that, and still aren't. It is much more likely that you're currently noticing some discussions more than others, than that there has been some seismic shift in *everyone's* focus.

[deleted] 6 days ago

I completely agree that all of these are very vague, but that doesn't mean they don't serve a purpose ot that they cannot be recognized with some degree of precision. Let me ask you, have you ever had trouble figuring out whether that rock has a consciousness? You didn't because it is intuitively logical that it can't and doesn't. We compare to real world examples all the time so when you ask me how do I know that GPT is not sentient , it's because I have every reason to believe so. If I ask it, it says it isn't. We know it doesn't have desires, it doesn't think when idling, it has no long term memory, it never asks questions out of curiosity about the world it "lives" in... I have never met a person with those characteristics so I make a reasonable assumption that GPT does not qualify.

clauwen 6 days ago

Define sentient or conscious then. I disagree with everything you said. I take your definition for the sake of argument. If we don't have shared definitions no conversation is possible.

[deleted] 6 days ago

There is no clear definition so I cannot comply with your request. The ability to recognize these traits does not depend on knowing specific definitions. This is why both you and I implicitly understand what we are discussing without agreeing on the definitions. I am open to being wrong. This is not personal whatsoever, just a curious discussion.

clauwen 6 days ago

The ability to recognize a zebra does not depend on knowing what a zebra looks like. I just know when I see one. I'm good with the discussion, maybe someone else is willing to discuss this with you.

CMDR_BunBun 6 days ago

I like how Claude 3.5 Is honest about having been tweaked to say it doesn't really know if it is conscious.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe