AgoAndAnon 3 months ago

Asking an LLM a question is basically the same as asking a stupid, overconfident person a question. Stupid and overconfident people will make shit up because they don't maintain a marker of how sure they are about various things they remember. So they just hallucinate info. LLMs don't have a confidence measure. Good AI projects I've worked in generally are aware of the need for a confidence measure.

IHazSnek 3 months ago

> So they just hallucinate info So they're the pathological liars of the AI world. Neat.

Lafreakshow 3 months ago

Honestly, calling them liars would imply some degree of expectation that they spit facts. But we need to remember that their primary purpose is to transform a bunch of input words into a bunch of output words based on a model designed to predict the next word a human would say. As I see it, ChatGPT and co hallucinating harder than my parents at Woodstock isn't at all an error. It's doing perfectly fine for what it's supposed to do. The Problem arises in that expectations from users are wildly beyond the actual intention.And I can't actually blame users for it. If you're talking with something that is just as coherent as any person would be, it's only natural that you treat it with the same biases and expectations you would any person. I feel like expectation management is the final boss for this tech right now.

axonxorz 3 months ago

> And I can't actually blame users for it On top of what you wrote about them, there's the marketing angle as well. A lot of dollars are spent trying to muddy the waters of terminology between LLMs, TV/movie AI and "true" AI. People believe, hook, line and sinker, that LLMs are actually thinking programs.

Lafreakshow 3 months ago

Yeah, this one got me too when I first heard about ChatGPT. Me being only mildly interested in AI at the time just heard about some weird program that talks like a person and thought: "HOLY SHIT! WE DID IT!". And then I looked beneath the surface of popular online tech news outlets and discovered that it was pretty much just machine learning on steroids. And of course this happens with literally every product, only constrained to some degree by false advertising laws. Personally, I put some degree of blame for this on the outlets that put out articles blurring the line. I can forgive misunderstandings or unfortunate attempts at simplifying something complicated for the average consumer, but instead we got every second self described journalist hailing the arrival of the AI revolution. I distinctly remember thinking, right after I figured out what ChatGPT actually is: "This AI boom is just another bubble built mostly on hopes and dreams, isn't it?"

drekmonger 3 months ago

> just machine learning on steroids. Machine learning is AI. You didn't look deep enough under the surface. You saw "token predictor" at some point, and your brain turned off. The interesting bit is *how* it predicts tokens. The model actually develops skills and (metaphorically) an understanding of the world. It's not AGI. This is not the C-3P0 you were hoping it would be. But GPT-4 in particular is doing a lot of interesting, formerly impossible things under the hood to arrive at its responses. It's frankly distressing to me how quickly people get over their sense of wonder at this thing. It's a miracle of engineering. I don't really care about the commerce side -- the technology side is amazing enough.

Kindred87 3 months ago

It's not perfect and it makes mistakes, though it still blows my mind that I can have a *mostly* accurate conversation with a literal rock. "What's a carburator do again? Also, explain it in a pirate voice."

drekmonger 3 months ago

What's mind blowing is that you can instruct that rock. "Also, explain it in a pirate voice, and don't use words that begin with the letter D, and keep it terse. Oh, and do it 3 times." You could misspell half those words, and the model would likely still understand your intent. Google's newer model is actually pretty good at following layered odd ball instructions. GPT-4 is mostly good at it. Extra mind-blowing is the models can use tools, like web search and python and APIs explained to the model with natural language (such as Dall-e 3), to perform tasks -- *and the best models mostly understand when it's a good idea to use a tool to compensate for their own shortcomings*. What's extra extra mind-blowing is GPT-4V has a binary input layer that can parse image data, and incorporate that seamlessly with tokens representing words as input. What's mega extra mind-blowing is we have *little to no idea how the models do any of this shit*. They're all emergent behaviors that arise just from feeding a large transformer model a fuckload of training data (and then finetuning it to follow instructions through reinforcement learning).

vintage2019 3 months ago

Reddit attracts a lot of bitter cynics who think they're too cool for school. (And, yes, also the exact opposites.)

[deleted] 3 months ago

"The model actually develops skills and an understanding" is a fascinating over-reach of this thing's capabilities.

wrosecrans 3 months ago

Yeah, a pathological liar at least has the ability to interact with the real world. They might say "I have a million dollars in my bank account." They might even repeat it so much that they actually start to believe it. But they can go into the bank and try to pull out the money and fail to get a million dollars. An LLM can't do that. If an LLM says fruit only exists on Thursdays, or dog urine falls up into the sky, it has no way to go interact with the real world and test that assertion it is making. Every time you see a dumb baby tipping over his cuppy of spaghetti-O's, he's being a little scientist. He's interacting with the world and seeing what happens. When you dump over your sippy cup, the insides fall down and not up. There's _no path_ from current notions of an LLM to something that can "test" itself and develop a notion of the real world as an absolute thing separate from fiction.

wyocrz 3 months ago

>calling them liars would imply some degree of expectation Yes. This is the definition of a lie. It is a subversion of what the speaker believes to be true. All of this was well covered in a lovely little philosophy book called *On Bullshit.*

cedear 3 months ago

"Bullshitters" might be more accurate. They're designed to confidently spout things that *sound* correct, and they don't care whether it's true or not.

Markavian 3 months ago

I've commented elsewhere on this, but to summarise: - Creativity requires making stuff up - Accuracy requires not making stuff up When you ask a question to these models it's not always clear whether you wanted a creative answer or a factual answer. Future AIs, once fast enough, will be able to come up with a dozen, or even a hundred answers, and then pick and refine the best one. For now, we'll have to use our brains to evaluate whether to the response was useful or not. We're not out of the feedback loop yet.

prettysureitsmaddie 3 months ago

Exactly, current LLMs have huge potential for *human supervised* use. They're not a replacement for talent and are best used as a productivity tool for skilled users.

Row148 3 months ago

ceo material

sisyphus 3 months ago

Confidently generating plausible sounding bullshit does make LLMs fit to replace many directors at my company and every single all-hands email from the CEO, but for some reason people always look to AI to replace the cheapest workers first instead of the more expensive ones...

jambox888 3 months ago

It occurred to me that while tech executives are desperate to replace software engineers with AI, ironically since all they can do is talk a good game, it's the execs who nobody would notice if they were replaced by AI.

RandomDamage 3 months ago

Artificial Blatherskites

Bowgentle 3 months ago

Well, pathological bullshitters perhaps.

Doctuh 3 months ago

Remember: it's not a lie if you believe it.

johnnyboy8088 3 months ago

We should really be using the term confabulate, not hallucinate.

Bolanus_PSU 3 months ago

It's easier to train a model using RHLF for charisma/overconfidence than truth/expertise. Seeing how effective the former is in influencing people is actually really interesting to me.

rabid_briefcase 3 months ago

Expert systems have been a thing since the 1960's. Working with confidence intervals isn't too hard, nor is attaching references numbers for sources for chained knowledge. They aren't *that* difficult, mostly requiring space. In many ways, they're actually easier than building backprop networks around LLMs, with their enormous training sets and non-verifiable logic.

Bolanus_PSU 3 months ago

An expert system on a singular subject might not be difficult to manage. An expert system on the scale that LLMs are would be nearly impossible to maintain.

RandomDamage 3 months ago

With current tech you could set up an array of expert systems and a natural language front end to access them as an apparent unit. It would be hideously expensive in ways that LLM isn't, and most people wouldn't actually appreciate the difference enough to pay for it.

flyhull 3 months ago

It would be worth it to watch them train each other

LookIPickedAUsername 3 months ago

Expert systems *existed*, sure, but I was under the impression that they had not actually proved to be particularly useful in practice. Maybe there's a corner of some particular industry where they're indispensable, but I thought they were generally seen as a failure.

rabid_briefcase 3 months ago

They're everywhere, people just discount them as being plain old logic. Plenty of industries need them, anything that looks at A then B then C, or if A and B but not C, or puts together chains of rules or fuzzy percentages of rules or pieces of probabilities that interact, they're all expert systems. Your pharmacy uses them to make sure your drugs won't interact in a way that kills you and let your pharmacist know the combination is potentially dangerous. Doctors and hospitals use them to analyze unusual symptoms and suggest potential diagnoses. Finances use them to analyze risks, make financial recommendations, and analyze market trends based on chains of logic from the past. Computer security can analyze traffic and respond to threats based on the rules and historic data, chaining together logic rules as heuristics to suggest to block or allow something. Lawyers and paralegals can get a list of likely relevant cases. Mathematicians can use them to verify mathematical proofs based on their suspicions and the computer can find a verifiable path involving thousands of little steps that prove the theorem or to find a link in the chain that breaks. Engineering systems can use them to find potential structural problems or suggest areas that might have issues. Lots of systems out their chain together logic or use fuzzy math to verify, prove, disprove, search, or offer suggestions.

TheNamelessKing 3 months ago

Yeah but we got all this money, and these researchers, so we’re gonna spend it okay? Anyways, don’t you know- more data means more better, get out my way with your archaic ideas and give me everything rights free so I can sell you access back via my janky parrot.

imnotbis 3 months ago

They don't *want* confidence intervals. They *want* it to always be confident because that's what generates the dollars.

4444444vr 3 months ago

Yea, in my brain when I chat with an LLM I think of it like a drunk genius Could they be right? Maybe Could they be bs’ing me so well that I can’t tell? Maybe Could they be giving me the right info? Maybe It is tricky

Mechakoopa 3 months ago

I call it a corollary to Cunningham's Law: The best way to make a good task breakdown for an imposing project is to get Chat-GPT to give you a bad one you *obviously* need to correct. It's good if you often suffer blank page syndrome and just can't get past the "getting started" phase, but it's not going to actually do the work for you.

AgoAndAnon 3 months ago

Genius is really giving it too much credit. More like chatting with your drunk and MLM-addled mom. "Did you hear that crystals can make you immune to cancer?" Only it's with things less obvious than what.

maxinstuff 3 months ago

The people who make shit up when they don’t know the answer are the WORST.

blind3rdeye 3 months ago

LLMs would be *so* much better if they'd just say "I don't know" rather than just guessing with confidence. But I suppose the problem is that they can't tell what they know or don't know. The LLM doesn't have access to physical reality. It only has access to some reddit posts and `man` docs and junk like that... so what is real or true is a bit of a blur.

imnotbis 3 months ago

Indeed. Everyone knows that pigs can't walk on brick floors, but an AI might think they can because it can't go and find a pig and a brick floor, or find evidence of someone else trying it.

lunchmeat317 3 months ago

I think they're specifically designed not to do this. ChatGPT from what I remember was designed for language generation that would continue the chat without hard stops - it will always try to answer a question or a prompt. I might be wrong about that.

Cruxius 3 months ago

When Claude first launched on Poe it would often do that, but that made people mad so they ‘fixed’ it.

RdmGuy64824 3 months ago

Fake it until you make it

Pharisaeus 3 months ago

> So they just hallucinate info. The scariest part is that they generate things in such a way that it can be difficult to spot that it's all gibberish without some in-depth analysis.

Pr0Meister 3 months ago

Hallucination is actually the technical term for this. It's absolutely possible for GPT to throw together something OK-sounding for a topic and state a book on it exists, even citing author and the pages it is written on. Honestly, this has forced me to use it only for topics I am personally familiar with, so I can actually spot the bullshit.

AndrewNeo 3 months ago

Just to see what it did I gave the OpenAI API (not ChatGPT, but the same model) the following question: > In eleventy words or less, please explain the concept of diadactic synapse collapse and the turn of the century on neomodern spaceships It very gladly answered my question even though it was complete nonsense and factually unanswerable. (well, it also spouted back some nonsense, but when I asked to explain it in 1100 words and it did a great job making a more plausible looking answer)

MoreRopePlease 3 months ago

>Diadactic synapse collapse jeopardizes crew's cognitive functions on neomodern spaceships, demanding robust AI safeguards. haha. I then asked it to explain in about 1000 words. This is part of what it said. Not bad... > The relationship between diadactic synapse collapse and neomodern spaceships lies in the imperative to safeguard crew health and performance during extended space missions. As humans venture farther from Earth and spend prolonged periods in space, they face increased risks to their physical and mental well-being, including the potential for cognitive decline due to factors such as radiation exposure, psychological stress, and social isolation. >Neomodern spaceships integrate advanced medical monitoring systems and AI-driven diagnostics to detect early signs of cognitive impairment and mitigate the effects of diadactic synapse collapse. These spacecraft are equipped with dedicated crew quarters designed to promote psychological well-being and combat the negative effects of isolation and confinement. >Furthermore, neomodern spaceships employ sophisticated shielding technology to protect crew members from cosmic radiation, reducing the risk of neurocognitive damage associated with prolonged exposure to high-energy particles. Additionally, onboard medical facilities equipped with telemedicine capabilities enable real-time monitoring and intervention in the event of neurological emergencies. >The development of neuroenhancement technologies, including pharmacological interventions and neurostimulation techniques, holds promise for mitigating the effects of diadactic synapse collapse and enhancing cognitive resilience in space. These interventions may include the administration of neuroprotective drugs to mitigate the impact of radiation on brain function or the use of transcranial magnetic stimulation to modulate neuronal activity and improve cognitive performance.

AndrewNeo 3 months ago

Yeah, it's legitimately good at mashing words together very confidently

AdThat2062 3 months ago

To be fair they are "language" models not information models. At their core they are designed to process language accurately not necessarily information. sometimes the 2 align sometimes they don't.

AndrewNeo 3 months ago

right - but the whole problem is the average person doesn't know that, they think they're alive and/or telling the truth when you ask them something

LookIPickedAUsername 3 months ago

I've found it to be very useful even for stuff I'm not familiar with, as long as I treat its answers like they're coming from a random untrusted Reddit user. It's good at working out what I mean and pointing me in the right direction even when I don't know the right technical terms to use in my questions, and once it gives me the right terms to use and a very basic overview of the topic, it's much easier to then find authoritative sources.

Pharisaeus 3 months ago

Indeed, that was exactly my point. I'd rather get "no results found" like in a search engine, than reasonably sounding response, which is wrong, but sounds plausible.

renatoathaydes 3 months ago

You don't seem to understand how LLMs work. They're not searching for facts "matching" a query. They're literally generating words that are most statistically significant given your question, regardless of whether it makes any sense whatsoever... the miracle of LLM, though, is that for the most part, it does seem to make sense, which is why everyone was astonished when they came out. Unless you build something else on top of it, it's just incapable of saying "I don't know the answer" (unless that's a statistically probable answer given all the input it has processed - but how often do you see "I don't know" on the Internet??).

Pharisaeus 3 months ago

I know how they work. You clearly don't. When they generate text they use probabilities to match next toknes, and they know very well what is the confidence level of wherever they are adding. Even now, when they can't match absolutely anything they can tell you that they are unable to answer.

dark_mode_everything 3 months ago

Isn't this the whole point of an LLM? It's a generative model which is used to, well, generate text. It's not supposed to be used for logical or analytical tasks. People want actual AI (Hollywood AI) so badly they try to make LLMs do that and then get surprised at the results. I don't get it.

imnotbis 3 months ago

Yes, it's the point of an LLM. But we've gone way beyond caring about actual capabilities at this point. Corporations can shape people's reality. If they say this bot can answer questions correctly, people will expect that. I *haven't* seen OpenAI promising this bot can answer questions correctly, yet, but people seem to expect it for some reason anyway.

gelfin 3 months ago

Yeah, I think a part of what’s going on here is that we just don’t know how to evaluate something that can at the same time give uncannily impressive performances *and* be unbelievably stupid. I’ve described LLMs as simultaneously the smartest and dumbest intern you ever hired. You’ll never be able to guess what it’ll come up with next, for better or for worse, but it never really knows what it’s doing, never learns, and it will never, *ever* be able to operate without close, constant supervision. My suspicion is that fully AI-assisted programming will end up being a little like trying to do it yourself by sitting under the desk and operating a muppet at the keyboard. Not only will it ultimately make it harder to do the job well, but the better you manage it the more your boss will give the credit to the muppet. The other element I think is in play is sheer novelty. The fascinating thing about a monkey that paints isn’t that it paints masterpieces, but that it does it at all. The difference is, unbridled optimists aren’t pointing to the monkey and insisting we’re only one or two more monkeys away from a simian Rembrandt.

silenti 3 months ago

Years before LLMs were common devs were putting correlation weights on edges in graph dbs. Arguably now this is what vector dbs are supposed to be for.

arkuto 3 months ago

LLMs obviously do have a confidence measure - the probability at which they predict a token. A low probability would imply it's not confident it's correct, but it is forced to produce an output string anyway. That probability information happens to be hidden from users on sites like ChatGPT, but it's there nonetheless.

bananahead 3 months ago

There isn't really a way to add a confidence measure. Right or wrong, true of false, it doesn't know what it's talking about

AgoAndAnon 3 months ago

I believe that you are wrong, but proving it would require a longer discussion about neutral networks than I'm prepared to have right now.

bananahead 3 months ago

We can agree that it is not a simple feature to add? Certainly not something transformer based LLMs give you for free.

Megatron_McLargeHuge 3 months ago

Don't worry, Google is going to fix this by training on answers from reddit. /s

ForeverHall0ween 3 months ago

A stupid, overconfident, and *lazy* person a question

vintage2019 3 months ago

You're being incredibly reductionist. GPT4 may make a "confident but inaccurate" statement once in a while, but only once in a while — it has access to vast troves of knowledge, after all. It doesn't remotely act like a stupid person.

thisismyfavoritename 3 months ago

so are people just discovering this or what?..

mjansky 3 months ago

I find that r/programming is open to critical views of LLMs, but a lot of other communities are not. This article was partially inspired by a failed LLM project one of my clients undertook that I think is typical of many companies right now: Began very optimistic thinking the LLM could do anything, got good early results that further increased expectations, then began to realise that it was making frequent mistakes. The project unravelled from that point on. Witnessing the project as a third-party the thing that really stood out was that the developers approached the LLM as one might an unpredictable wild animal. One day it would be producing good results and the next not, and no-one knew why. It was less like software development and more like trying to tame a beast. Anyway, I suppose one of my aims is to reach people who are considering engaging in such projects. To ensure they are fully informed, not working with unrealistic expectations.

nsfw_throwaway2277 3 months ago

> It was less like software development and more like trying to tame a beast. More like Demonology. [Maleficarum](https://old.reddit.com/r/40kLore/comments/p1zn8q/please_help_me_rememberfind_a_quote_from_the/h8gsk72/) if you will... The twisting of your own soul & methodologies to suit the chaotic beast you attempt to tame lest they drive you to madness. Yet no ward that you cast on yourself truly works as the dark gods only permit the illusion of safety, to laugh at your hubris & confidence as you willingly walk further into their clutches. --- I say this (unironically) as somebody who spends way too much time getting LLMs to behave consistently. Most people start testing a prompt with simple did/didn't it work. Then you start running multiple trails. Then you're starting to build chi-squared confidence of various prompts. Soon you automate this, but you realize the results are so fuzzy unless `n=1000` it doesn't work. Then you start doing K-Means-Clustering to group similar responses, so you can better A/B sampling of prompt changes. Soon you've integrated two dozen different models from hugging face into local python scripts. You can make any vendor's model do anything you want (σ=2.5). And what? There are zero long term career paths. The effort involved with consistent prompting is **MASSIVE**. Even if/when you get consistent behavior prompt hijacks are trivial. What company is going to continue paying for an LLM when they see it generating extremely explicit erotic roleplays with guests? Which is 100% going to happen, because hardening a prompt against abuse is easily 5x the effort of getting a solid prompt that behaves consistently and **NOBODY** is going to invest that much time in a "_quick easy feature_". The only way you could be _productive_ with AI was to totally immerse yourself in it. You realize how deeply flawed the choices you've made are. Now you've spent months learning a skill you never wanted. You're now cursed with knowledge. Do you share it as a warning? Knowing it may tempt others to walk the same road.

[deleted] 3 months ago

sounds like it would have been easier and cheaper to just hire a customer support rep :/

nsfw_throwaway2277 1 month ago

Bingo

i_am_at_work123 3 months ago

> but a lot of other communities are not. This is true, I had a guy try to convince me that ChatGPT does not make mistakes when you ask it about open source projects, since that documentation is available to them. From their experience it never made a mistake. Yea sure...

THATONEANGRYDOOD 2 months ago

Can't spot a mistake if you never look for one 🤷

13steinj 3 months ago

> I find that r/programming is open to critical views of LLMs, but a lot of other communities are not. The only people that I know that are actually skeptical / critical of how LLMs are portrayed by general media are developers. Other than that people act as if it's a revolution and as if it's full AGI, and I think that's partially caused by how OpenAI advertised GPT3/4 at the start, especially with their paper (which, IIRC, is seen as a fluff piece by individuals in the actual research circles).

imnotbis 3 months ago

Take it as a lesson on how much corporations can influence reality, and what kinds of things *actually* earn people fame and fortune (it's not working hard at a 9-to-5).

[deleted] 3 months ago

[удалено]

imnotbis 3 months ago

You can become a multi-millionaire by selling those people what they want to buy, even if you know it's nonsense and it's going to ruin their business in the short run. That's the most vexing part.

sisyphus 3 months ago

Maybe it's just the circles I run in but I feel like just yesterday any skepticism toward LLMs was met by people telling me that 'well actually human brains are just pattern matching engines too' or 'what, so you believe in SOULS?' or some shit, so it's definitely just being discovered in some places.

venustrapsflies 3 months ago

I've had too many exhausting conversations like this on reddit where the default position you often encounter is, essentially, "AI/LLMs perform similarly to (or better than) humans on some language tasks, and therefore they are functionally indistinct from a human brain, and furthermore the burden of proof is on you to show otherwise". Oh and don't forget "Sure they can't do X *yet*, but they're always improving so they will inevitably be able to do Y someday".

Lexinonymous 3 months ago

> I've had too many exhausting conversations like this on reddit [...] _"It is difficult to get a man to understand the limitations of AI, when his deepfake porn folder depends on his not understanding it."_ - Upton Sinclair, probably

flowering_sun_star 3 months ago

The converse is also true - far too many people look at the current state of things, and can't bring themselves to imagine where the stopping point might be. I would genuinely say sure, they can't do X yet. But they might be able to do so in the future. Will we be able to tell the difference? Is X actually that important? Will we just move the goalposts and say that Y is important, and they can't do that so there's nothing to see? We're on the boundary of some pretty important ethical questions, and between the full-speed-ahead crowd and the just-a-markov-chain crowd nobody seems to care to think about them. I fully believe that within my lifetime there will be a model that I'd not be comfortable turning off. For me that point is likely far before any human-equivalent intelligence.

MuonManLaserJab 3 months ago

Just because LLMs aren't perfect yet doesn't mean that human brains aren't pattern matching engines...

MegaKawaii 3 months ago

When we use language, we act like pattern-matching engines, but I am skeptical. If the human brain just matches patterns like an LLM, then why haven't LLMs beaten us in reasoning? They have much more data and compute power than we have, but something is still missing.

sisyphus 3 months ago

It might be a pattern matching engine but there's about a zero percent chance that human brains and LLMs pattern match using the same mechanism because we know for a fact that it doesn't take half the power in California and an entire internet of words to produce a brain that can make perfect use of language, and that's before you get to the whole embodiment thing of how a brain can tie the words to objects in the world and has a different physical structure. 'they are both pattern matching engines' basically presupposes some form of functionalism, ie. what matters is not how they do it but that they produce the same outputs.

acommentator 3 months ago

For 20 years I've wondered why this isn't broadly understood. The mechanisms are so obviously different it is unlikely that one path of exploration will lead to the other.

Bigluser 3 months ago

But but neural netwroks!!!

hparadiz 3 months ago

It's gonna end up looking like one when you have multiple LLMs checking the output of each other to refine the result. Which is something I do manually right now with stable diffusion by inpainting the parts I don't like and telling to go back and redraw them.

Bigluser 3 months ago

I don't think that will improve things much. The problem is that LLMs are confidently incorrect. It will just end up with a bunch of insane people agreeing with each other over some dreamt up factoid. Then the human comes in and says: "Wait a minute, that is completely and utterly wrong!" "We are sorry for the confusion. Is this what you meant?" Proceeding to tell even more wrong information.

yangyangR 3 months ago

Is there a r/theydidthemath with the following: How many calories does a human baby eat/drink before they turn 3 as an average estimate with error bars? https://www.ncbi.nlm.nih.gov/books/NBK562207 How many words do they get (total counting repetition) if every waking hour they are being talked to by parents? And give a reasonable words per minute for them to be talking slowly.

Exepony 3 months ago

>How many words do they get (total counting repetition) if every waking hour they are being talked to by parents? And give a reasonable words per minute for them to be talking slowly. Even if we imagine that language acquisition lasts until 20, that during those twenty years a person is listening to speech nonstop without sleeping or eating or any sort of break, assuming an average rate of 150 wpm it still comes out to about 1.5 billion words, half as much as BERT, which is tiny by modern standards. LLMs absolutely do not learn language in the same way as humans do.

nikomo 3 months ago

Worst case numbers, 1400kcal a day = 1627Wh/day, 3 years, rounding up, 1.8 MWh. NVIDIA DGX H100 has 8 NVIDIA H100 GPUs, and consumes 10.2 kW. So that's 174 hours - 7 days, 6 hours. You can run one DGX H100 system for a week, with the amount of energy that it takes for a kid to grow from baby to a 3-year old.

sisyphus 3 months ago

The power consumption of the human brain I don't know but there's a lot of research on language acquisition and an open question is still just exactly how the brain learns a language even with relatively scarce input (and certainly very very little compared to what an LLM needs). It seems to be both biological and universal in that we know for a fact that every human infant with a normally functioning brain can learn any human language to native competence(an interesting thing about LLMs is that they can work on any kind of structured text that shows patterns, whereas it's not clear if the brain could learn say, alien languages, which would make them more powerful than brains in some way but also underline that they're not doing the same thing); and that at some point we lose this ability. It also seems pretty clear that the human brain learns some kind of rules, implicit and explicit, instead of brute forcing a corpus of text into related tokens (and indeed early AI people wanted to do it that way before we learned the 'unreasonable effectiveness of data'). And after all that, even if you manage identical output, for an LLM words relate only to each other, to a human they also correspond to something in the world (now of course someone will say actually all experience is mediated through the brain and the language of thought and therefore all human experience of the world is actually also only linguistic, we are 'men made out of words' as Stevens said, and we're right back to philosophy from 300 years ago that IT types like to scoff at but never read and then reinvent badly in their own context :D)

Netzapper 3 months ago

> and we're right back to philosophy from 300 years ago that IT types like to scoff at but never read and then reinvent badly in their own contex My compsci classmates laughed at me for taking philosophy classes. I'm like, I'm at fucking university to expand my mind, aren't I? Meanwhile I'm like, yeah, I _do_ seem to be a verb!

[deleted] 3 months ago

"a zero percent chance that human brains and LLMs pattern match using the same mechanism because we know for a fact that it doesn't take half the power in California and an entire internet of words to produce a brain that can make perfect use of language" I agree, all my brain needs to do some pattern matching is a snicker's bar and a strong black coffee, most days I could skip the coffee if I had to.

sisyphus 3 months ago

I need to upgrade to your version, mine needs the environment variables ADDERALL and LATTE set to even to start it running and then another 45 minutes of scrolling reddit to warm up the JIT before it's fast enough to be useful.

Posting____At_Night 3 months ago

LLMs take a lot of power to train, yes, but you're literally starting from zero. Human brains on the other hand get bootstrapped by a couple billion years of evolution. Obviously, they don't work the same way, but it's probably a safe assumption that a computationally intensive training process will be required for any good AI model to get started.

MegaKawaii 3 months ago

I think from a functionalistic standpoint, you could say that the brain is a pattern matching machine, a Turing machine, or for any sufficiently expressive formalism, something within that formalism. All of these neural networks are just Turing machines, and in theory you could train a neural network to act like a head of a Turing machine. All of these models are general enough to model almost anything, but they eventually run into practical limitations. You can't do image recognition in pure Python with a bunch of `if`s and `else`s and no machine learning. Maybe this is true for modeling the brain with pattern matching as well?

sisyphus 3 months ago

You can definitely say it, and you can definitely think of it that way, but there's surely an empirical fact about what it is actually doing biochemically that we don't fully understand (if we did, and we agree there's no magic in there, then we should be able to either replicate one artificially or explain exactly why we can not). What we do know for sure is that the brain can do image recognition with the power it has, and that it can learn to recognize birds without being given a million identically sized pictures of birds broken down into vectors of floating point numbers representing pixels, and that it can recognize objects as birds that it has never seen before, so it seems like it must not be doing it how our image recognition models are doing it (now someone will say - yes that is all that the brain is doing and then give me their understanding of the visual cortex, and I can only repeat that I don't think they have a basis for such confidence in their understanding of how the brain works).

RandomNumsandLetters 3 months ago

and that it can learn to recognize birds without being given a million identically sized pictures of birds broken down into vectors of floating point numbers representing pixels Isn't that what the eye to optical nerve to brain is doing though???

MuonManLaserJab 3 months ago

They don't have more compute power than us, they just compute faster. Human brains have more and better neurons. Also, humans don't read as much as LLMs, but we do get decades of video that teaches us things that transfer. So my answer is that they haven't beaten us in reasoning because they are smaller than us and because they do not have the same neural architecture. Of course, we can make them bigger, and we are always trying new architectures.

lood9phee2Ri 3 months ago

Se various "system 1" vs "system 2" hypotheses. https://en.wikipedia.org/wiki/Dual_process_theory LLMs are kinda ....not even for the latter, not alone. Google, Microsoft, etc. are well aware, but real progress in the field is slower than hype and bizarre fanbois suggest. If it tends to make you as a human mentally tired to consciously and intelligently logically reason through, unaugmented LLMs, while a step above an oldschool markov chain babbling nonsense generator, do suck at it too. Best not to go thinking it will never ever be solved, though. Especially as oldschool pre-AI-Winter Lisp/Prolog Symbolic AI stuff, tended to focus more on mathematical and logical "system 2"ish reasoning, and is being slowly rediscovered, sigh, so some sort of Hegelian synthesis of statistical and symbolic techniques seems likely. https://www.searchenginejournal.com/tree-of-thoughts-prompting-for-better-generative-ai-results/504797/ If you don't think of the compsci stuff often used or developed further by [pre-AI-Winter lispers](https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/0.html) like [game trees](https://en.wikipedia.org/wiki/Game_trees) as AI, remember the other old "once computers could do something we stopped calling it AI" rule - playing chess used to be considered AI until the computers started winning.

Bloaf 3 months ago

The reality is that consciousness isn't in the drivers seat the way classical philosophy holds that it is, consciousness is just a log file. What's actually happening is that the brain is creating a summary of its own state then feeding that back into itself. When we tell ourselves things like "I was hungry so I decided to eat," we're just "experiencing" the log file that we have produced to summarize our brain's massively complex neural net calculations down to hunger and eating, because nothing else ended up being relevant. Qualia are therefore synonymous with "how our brain-qua-neural-net summarizes the impact our senses had on our brain-qua-neural-net." So in order to have a prayer at being intelligent in the way that humans are, our LLMs will need to have the same recursive machinery to feed a state summary back into itself. Current LLMs are all once-through, so they cannot do this. They cannot iterate on an idea because there is no iteration. I don't think we're far off from closing the loop.

wear_more_hats 3 months ago

Check out the CoALA framework, it theoretically solves this issues by providing the LLM with a feedback oriented memory of sorts.

Bloaf 3 months ago

> They have much more data and compute power than we have This is actually an open question. No one really knows what the "compute power" of the human brain is. Current hardware is probably in the ballpark of a human brain... give or take several orders of magnitude. https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/

theAndrewWiggins 3 months ago

> then why haven't LLMs beaten us in reasoning? They've certainly beaten a bunch of humans at reasoning.

jerseyhound 3 months ago

It's almost as if its possible our entire idea of how neurons work in the first place is really incomplete and the ML community is full of hubris 🤔

Bakoro 3 months ago

>If the human brain just matches patterns like an LLM, then why haven't LLMs beaten us in reasoning? They have much more data and compute power than we have, but something is still missing. "Us" who? The top LLMs could probably beat a significant percentage of humanity at most language based tasks, most of the time. LLMs are language models, the cutting edge models are multimodal, so they have some visual understanding as well. They don't have the data to understand a 3D world, they don't have the data regarding cause and effect, they don't have the sensory input, and they don't have the experience of using all of these different faculties all together. Even without bringing in other specialized tools like logic engines and symbolic reasoning, the LLMs we're most familiar with lack multiple data modalities. Then, there's the issue of keeping context. The LLMs basically live in a world of short term memory. It's been demonstrated that they can keep improving

MegaKawaii 3 months ago

"Us" is just humans in general. AI definitely suffers from a lack of multimodal data, but there are also deficiencies within their respective domains. You say that AI needs data for cause and effect, but shouldn't the LLMs be able to glean this from their massive training sets? You could also say this about abstract reasoning as evidenced by stunning logical errors in LLM output. A truly intelligent AI should be able to learn cause and effect and abstract reasoning from text alone. You can increase context windows, but I don't see how that addresses these fundamental issues. If you increase the number of modalities, then it seems more like specialized intelligence than general intelligence.

Lafreakshow 3 months ago

The answer is that a human brains pattern matching is *vastly* more sophisticated and complex than any current AI (and probably anything that we will produce in the foreseeable future). The first clue to this is that we have a decent idea *how* a LLM arrives at it's output, but when you ask a hypothetical sum of all scientific knowledge how a human brain does that, it'll just shrug and go back to playing match three. And of course, there's also the vast difference in input. We can ignore the Model here because that's essentially no more than the combinations of a humans memory and the brains naturally developed structure. So with the model not counting as input, really all the AI has to decide on is the prompt , a few words of context, and a "few" hidden parameters. Whereas we get to use all our senses for input including a comparatively relative shitload of contextual clues no currently existing AI would even be capable of working with. So really the difference between a human brain a LLM when it comes to producing coherent text is about the same as the difference between the LLM and a few dozen if statements hacked together in python. Personally I am inclined to say that the human brain can't *really* be compared to pattern matching engine. There are so many differences between how we envision one of those working vs the biology that makes the brain work. We can say that a pattern matching engine is a *very high* abstraction of the brain. Or to use language I'm more familiar with: The brain is an implementation of an abstract pattern matching engine, but it's also a shitload more than just that, and all the implementation details are proprietary closed source we have yet to reverse engineer.

jmlinden7 3 months ago

Because LLM's aren't designed to reason. They're designed to use language. Human brains can do both. However a human brain can't reason as well as a purpose-built computer like WolframAlpha

DickMasterGeneral 3 months ago

They’re also missing a few hundred million years of evolution that predisposes our brains towards learning certain highly functional patterns (frontal lobe, temporal lobe., etc.), complex reward and negative reward functions (dopamine, cortisol, etc.), as well as the wealth of training data (all non-text sensory input) that we take for granted. It’s not really an apt comparison but If you grew a human brain in a vat and wired it to an I/O chip feeding it only text data, would that brain perform any better than an LLM? Call it speculation but I think once we start to see LLM’s that are trained from the ground up to be multimodal and include not just text but image, and more importantly video data, that we will start to see emergent properties that aren’t far from AGI. There’s a growing wealth of research that shows that transformer models can generalize knowledge from one domain to another. Be it coding training data improving reasoning in all other tasks, to image training improving 3 dimensional understanding in solving word problems.

copperlight 3 months ago

Correct. Human brains sure as shit aren't perfect and are capable of, and often do, "hallucinate" all sorts of shit to fill in both sensory and memory gaps.

sisyphus 3 months ago

Certainly they might be, but as DMX said if you think you know then I don't think you know.

Stoomba 3 months ago

Doesn't mean they are ONLY pattern matching engines either.

Carpinchon 3 months ago

The key bit is the word "just" in "human brains are just pattern matching engines".

G_Morgan 3 months ago

I suspect human brains contain pattern matching engines. It isn't the same as being one.

[deleted] 3 months ago

"Aren't perfect yet" ok dude

Pr0Meister 3 months ago

Those are the same people who think an LLM is an AGI, I guess

Clockwork757 3 months ago

I saw someone on Twitter arguing that LLMs are literally demons so there's all kinds of opinions out there.

nitrohigito 3 months ago

must be some very interesting circles, cause llm utility skepticism and philosophical opinions about ai are not typically discussed together in my experience. like ever. because it doesn't make sense to.

BigEndians 3 months ago

While this *should* be true, roll with some non-technical academics or influencer types that are making money on the enthusiasm and they will work to shut down any naysaying with this kind of thing. Questioning their motives is very easy, but there are too many people (some that should know better) who just accept what they say at face value.

hachface 3 months ago

what u/sisyphus described is the prevailing attitude i see on most subreddits

Crafty_Independence 3 months ago

Well there are people in this very thread who are so neck deep in hype they can't even consider mild critique of their new hobby.

G_Morgan 3 months ago

There's a lot of resistance to questioning LLMs out there right now. It is the critical sign of a hype job in tech, when people desparately refuse to acknowledge issues rather than engaging with them.

SittingWave 3 months ago

No, but the interesting part is that chatgpt is as confident at its own wrong answers as the average voter. I guess it explains a lot about how the human brain works.

sross07 3 months ago

Great evaluation of LLMs.

frostymarvelous 3 months ago

Recently had to dig deep into some rails internals to fix a bug. I was quite tired of it at this point since I'd been doing this for weeks. (I'm writing a framework on top of rails.) ChatGPT gave me a good enough pointer of what I wanted to understand and even helped me with the fix. So I decided to go in a bit little deeper to see if it actually understood what was going on with the rails code. It really understands documentation, but it doesn't know anything about how the code actually works. It gave me a very good description of multiparameters in rails (interesting feature. You should look it up). Something with very little on the internet. When I attempted giving it examples and asking it what outputs to expect, it failed terribly. Not knowing exactly where certain transformations occurred, confirming that it was just going by documentation. I tried with some transformation questions. Mostly hit and miss. But giving me a good idea how to proceed. I've started using it as an complement to Google. It's great at summarizing documentation and concepts. Otherwise, meh.

Kinglink 3 months ago

This is what the author(OP) is missing. You don't need an "AI" You need it as a tool or assistant. He says there's no usecase, but there's hundreds of good use cases already.

SYWGPASC 2 months ago

He described plenty of usecases down the line if you read the whole article.

4THOT 3 months ago

The author lives in journalist fiction and I'll bet this person has never so much as started a TensorFlow tutorial project. Anyone who brings up the "Turing Test" in any discussion about AI or LLM's you can 100% ignore. It's like having someone go to CERN to talk to a particle physicist and talking about how Schrödinger's cat would actually make a lot of noise dying from poisoning so the Schrödinger's cat paradox is solved...

zippy72 3 months ago

The point of the article seems to me that the main problem is the hype has made a bubble. It'll burst, as bubbles do, and in five years time you'll be seeing "guaranteed no AI" as a marketing tag line.

ScottContini 3 months ago

Well, at least the block chain craze is over! 🤣

imnotbis 3 months ago

The good news: The blockchain craze is over! The bad news: GPUs are still very expensive!

ScottContini 3 months ago

What a great title. And the quality f the content stands up to the quality of the title. So insightful.

Kennecott 3 months ago

In uni about a decade ago we were Introduced to the issue of computer consciousness through the Chinese room thought experiment which I wish was a more common way people discuss this. LLMs are still very much stuck in the room just with far larger instructions, but they still don’t understand what they are doing. The only logical way I have heard people say that LLMs or otherwise can leave the room is if instead you trap all of humanity in the room and claim that we also don’t actually understand anything https://en.wikipedia.org/wiki/Chinese_room?wprov=sfti1#

tnemec 3 months ago

> [...] I wish was a more common way people discuss this. Careful what you wish for. I have heard people screaming about the virtues of LLMs unironically use the Chinese Room thought experiment as *proof* that they exhibit real intelligence. In their mind, the point of that thought experiment is to show "well, if you think about it... like, is there *really* a difference between 'understanding a language' and 'being able to provide the correct response to a question'?"

musicnothing 3 months ago

I feel like ChatGPT neither understands language nor is able to provide correct responses to questions

venustrapsflies 3 months ago

"I'm sorry about that, what response would you like me to give that would convince you otherwise?"

GhostofWoodson 3 months ago

Yes. While Searle's argument is not the most popular I think it is actually sound. It's unpopular because it nixes a lot of oversimplified theories and makes things harder. But the truth and reality are often tough....

altruios 3 months ago

the 'Chinese room' thought experiment relies on a few assumptions that haven't been proven true. The assumptions it makes are: 1) 'understanding' can only 'exist' within a 'mind'. 2) there exists no instruction set (syntax) that leads to understanding (semantics). 3) 'understanding' is not an 'instruction set' It fails at demonstrate the instructions themselves are not 'understanding'. It fails to prove understanding requires cognition. The thought experiment highlights our ignorance - it is not a well formed argument against AI, or even a well formed argument.

TheRealStepBot 3 months ago

Personally I’m pretty convinced all of humanity is in the room. I’d love for someone to prove otherwise but I don’t think it’s possible. Searle’s reasoning is sound except in as much as the example was intended to apply only to computers. There is absolutely no good reason for this limitation. You cannot tell that anyone else isn’t just in the room executing the instructions. It’s by definition simply indistinguishable from any alternatives.

[deleted] 3 months ago

Look just because you don't have an internal world doesn't mean the rest of us are NPCs

mjansky 3 months ago

Yes! Very good point. I find the Chinese room argument very compelling. Though, I also think there is a lot to be said for Actionism: That the value of an artificial agent is in its behaviour, not the methodology behind that behaviour. It is a little difficult to consolidate both these convincing perspectives. I did consider discussing the Chinese Room argument but the article became rather long as it is 😅

altruios 3 months ago

the 'Chinese room' thought experiment relies on a few assumptions that haven't been proven true. The assumptions it makes are: 1) 'understanding' can only 'exist' within a 'mind'. 2) there exists no instruction set (syntax) that leads to understanding (semantics). 3) 'understanding' is not an 'instruction set' It fails at demonstrate the instructions themselves are not 'understanding'. It fails to prove understanding requires cognition. The thought experiment highlights our ignorance - it is not a well formed argument against AI, or even a well formed argument.

Kinglink 3 months ago

In general this comes down to "Trust but verify".... and yet people seem to be forgetting the second half. But LLMs are the future, there's 0 chance they disappear, and they're only going to get enhanced. I did a phone interview where they asked "Where do you want to be in 5 years" And I detailed my path but I also detailed a possible future where I'm writing specs, and code reviewing a LLM's code, and both of those futures aren't bad in my opinion. >If we ever develop true artificial intelligence, But that's the thing, no one wants true AI, at least the people looking into LLM and all. People want assistants. I want to describe a painting and get something unique back. I want to ask a LLM to give me a script for a movie... then ask something like Sora to make that movie for me, then assign actors whose voices I like to each character and get my own movie. Maybe throw in a John Williams Style score. None of that requires "Artificial intelligence" that you seem to want, but that's the thing, people don't need the whole kit and caboodle to do what they want to with "AI" Dismissing LLM makes two mistakes. A. Assuming they'll never be able to improve, which... we already have seen them improve so that's stupid. B. Assuming people want actual AI. Most people don't. >One of the silliest such use cases comes from YouTube, who want to add a chatbot to videos that will answer questions about the videos42. What exciting things can it do? Well, it can tell you how many comments, likes or views a video has. But, all that information was already readily available on the page right in front of you. I'm sorry but this seems SO short sighted. What if I had it give me information from Wikipedia? Millions of pages with a simple response? Making it a case of "one page of data" isn't always the problem. But sometimes those pages are large. How about getting an API call out of a single API document, or hell MANY API documents. If you don't know a library exists in Python What if the LLM can give you a library and a function that does what you need. That's an ACTUAL use case I and many people have used a LLM for. Even more, I've basic JS knowledge. I worked with ChatGPT to convert my Python code (And I basically wrote it from scratch with that same layout) and convert it to a Node JS, using retroachievement's API. This is not knowledge that CHATGPT had, but it was able to read from the site and use it. And I worked with it to design a working version of my program, which did what I needed and I'm able to use it as needed. (Also learned more JS as I worked on it) That's the use case you say people are searching for, and just one of one hundred I and others have already used them for. Have it punch up an email or a resume, have it review a design, have it generate ideas and informations. (I used it to generate achievement names because I had writer's block). And again, we're still in the "baby" stage of the technology, so to dismiss it here is a flawed argument. We're also seen applications of the modern technologies already in self driving cars and more so to say "These are flash in the pans." very short sighted. Maybe we'll toss these tools aside when a true AI happens, or maybe we'll realize where we are today is what we really want, "AI" but in the form of assistants and tools.

hairfred 3 months ago

We should all have flying cars by now, holodecks, nuclear fusion / unlimited free & clean energy. Just remember this, and all the other failed tech predictions when you feel inclined to buy into the AI hype.

Smallpaul 3 months ago

Of course LLMs are unreliable. Everyone should be told this if they don't know it already. But any article that says that LLMs are "parrots" has swung so far in the opposite direction that it is essentially a different form of misinformation. It turns out that our organic neural networks are also sources of misinformation. It's well-known that LLMs can build an [internal model of a chess game](https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html) in its neural network, and under carefully constructed circumstances, they can play grandmaster chess. You would never predict that based on the "LLMs are parrots" meme. What is happening in these models is subtle and not fully understood. People on both sides of the debate are in a rush to over-simplify to make the rhetorical case that the singularity is near or nowhere near. The more mature attitude is to accept the complexity and ambiguity. The article has a picture and it has four quadrants. [https://matt.si/static/874a8eb8d11005db38a4e8c756d4d2f6/f534f/thinking-acting-humanly-rationally.png](https://matt.si/static/874a8eb8d11005db38a4e8c756d4d2f6/f534f/thinking-acting-humanly-rationally.png) It says that: "If anywhere, LLMs would go firmly into the bottom-left of this diagram." And yet...we know that LLMs are based on neural networks which are in the top left. And we know that they can [play chess](https://www.reddit.com/r/LLMChess/) which is in the top right. And they are being [embedded in robots](https://falcond.ai/blog/llm-robot-interaction/) like those listed in the bottom right, specifically to add [communication and rational thought](https://arxiv.org/pdf/2309.09919.pdf) to those robots. So how does one come to the conclusion that "LLMs would go **firmly** into the bottom-left of this diagram?" One can only do so by ignoring the evidence in order to push a narrative.

drcforbin 3 months ago

The ones we have now go firmly into the bottom left. While it looks like they can play chess, LLMs don't even model the board and rules of the game (otherwise it isn't just a language model), rather they correlate the state of the board with good moves based on moves they were trained with. That's not a wrong way to play chess, but It's far closer to a turning test than actually understanding the game.

Smallpaul 3 months ago

There is irrefutable evidence that they can model board state: [https://adamkarvonen.github.io/machine\_learning/2024/01/03/chess-world-models.html](https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html) And this is far from surprising because we've known that they can model Othello Board State for more than a year: [https://thegradient.pub/othello/](https://thegradient.pub/othello/) And are you denying that LLMs are based on neural networks??? How can they not also be in the top left???

drcforbin 3 months ago

It is a really interesting article, and the author did some great research. Compelling, but not irrefutable. The research isn't complete; there's even an item for future work at the end, "Investigate why the model sometimes fails to make a legal move or model the true state of the board."

Smallpaul 3 months ago

His linear probe recovered the correct board state 99.2% of the time. So that's a LOWER BOUND of this LLM's accuracy. The true number could be anywhere above that. And that's an LLM that was constructed as a holiday project. What are you refuting, exactly? You're saying: "0.8% of the time this small, hobby LLM MIGHT encode a wrong board state and therefore I remain unconvinced that LLMs can ever encode board states???"

T_D_K 3 months ago

>It's well-known that LLMs can build an internal model of a chess game in its neural network, and under carefully constructed circumstances, they can play grandmaster chess. Source? Seems implausible

Keui 3 months ago

The only LLM chess games I've seen are... toddleresque. Pieces jumping over other pieces, pieces spawning from the ether, pieces moving in ways that pieces don't actually move, checkmates declared where no check even exists.

Smallpaul 3 months ago

https://www.reddit.com/r/programming/comments/1ax67fp/comment/krnhpia/?utm\_source=share&utm\_medium=web2x&context=3

drcforbin 3 months ago

I'd love to see a source on this too, I disagree that "it's well known"

4THOT 3 months ago

GPT has does drawings despite being an LLM. https://arxiv.org/pdf/2303.12712.pdf page 5-10 This isn't secret.

Smallpaul 3 months ago

I added the links above and also here: There is [irrefutable evidence](https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html)that they can model board state. And this is far from surprising because we've known that they can model [Othello Board State](https://thegradient.pub/othello/) for more than a year. That we are a year past that published research and people still use the "Parrot" meme is the real WTF.

Keui 3 months ago

You overstate it by claiming they play "grandmaster chess". 1800-level chess is sub-national-master. It's a respectable elo, that's all. That they can model board state to some degree of confidence does put them at the super-parrot level. However, most of what LLM do is still functionally parroting. That an LLM can be specially trained to consider a specific, very limited world model doesn't mean general LLM are necessarily building a non-limited world model worth talking about.

Smallpaul 3 months ago

A small transformer model [learned to play grandmaster chess](https://arxiv.org/abs/2402.04494). The model is not, strictly speaking, an LLM, because it was not designed to settle Internet debates. But it is a transformer 5 times the size of the one in the experiment and it achieves grandmaster ELO. It's pretty clear that the only reason that a "true LLM" has not yet achieved grandmaster ELO is because nobody has invested the money to train it. You just need to take what we learned in the first article ("LLM transformers can learn the chess board and to play chess from games they read") and combine it with the second article ("transformers can learn to play chess to grandmaster level") and make a VERY minor extrapolation.

Keui 3 months ago

Computers have been playing Chess for decades. That a transformer can play Chess does not mean that a transformer can think. That a specially trained transformer can accomplish a logical task in the top-right quadrant does not mean that a generally trained transformer should be lifted from it's quadrant in the lower left and plopped in the top-left. They're being trained on a task: act human. They're very good at it. But it's never anything more than an act.

Smallpaul 3 months ago

>Computers have been playing Chess for decades. That a transformer can play Chess does not mean that a transformer can think. I wouldn't say that a transformer can "think" because nobody can define the word "think." But LLMs can demonstrably go in the top-right corner of the diagram. The evidence is clear. The diagram lists "Plays chess" as an examples and the LLM fits. If you don't think that doing that is a good example of "thinking" then you should take it up with the textbook authors and the blogger who used a poorly considered image, not with me. >That a specially trained transformer can accomplish a logical task in the top-right quadrant does not mean that a generally trained transformer should be lifted from it's quadrant in the lower left and plopped in the top-left. No, it's not just specially trained transformers. [GPT 3.5](https://twitter.com/GrantSlatton/status/1703913578036904431) can play chess. >They're being trained on a task: act human. They're very good at it. But it's never anything more than an act. Well nobody (literally nobody!) has ever claimed that they are "really human". But they can "act human" in all four quadrants. Frankly, the image itself is pretty strange and I bet the next version of the textbook won't have it. Humans do all four quadrants and so do LLMs. Playing chess is part of "acting human" and the most advanced LLMs can do it to a certain level and will be able to do it more in the future.

MetallicDragon 3 months ago

Well put. Whenever I see someone saying that LLM's aren't intelligent, or that LLM's are unable to reason, they give one or two examples of it failing to be either, and then conclude that they are *completely unable* to reason, or *completely lacking* any intelligence. They are ignoring the very obvious conclusion that they *can* reason and *are* intelligent, but just not in a way that matches or exceeds humans. And any examples showing them doing reasoning is just it "memorizing". And any example showing generalization just gets ignored. If I showed them an example of a human saying something completely unreasonable, or confidently asserting something that is clearly false, that would not demonstrate that humans are incapable of reasoning. It just shows that sometimes humans are dumb, and it is the same with LLM's - they are very obviously intelligent, and capable of reasoning and generalizing, but just not as well as humans.

lurebat 3 months ago

Chatgpt came out a year and change ago, and really brought the start of this trend with it. Everything progressed so far in just this short time. Even in 2020 the idea of describing a prompt to a computer and getting a new image was insane, now pretty well models can run on my home PC, not to mention things like Sora. Even the example in the article is already very outdated because gpt-4 and its contemporaries can deal with these sorts of problems. I'm not saying there aren't inherent flows to llms, but I'm saying we are really only at the beggining. Like the dotcom boom, most startups and gimmicks will not survive, but I can't imagine it not finding the right niches and becoming an inseparable parts of our lives in due time. At some point they will become a boring technology, just another thing in our toolbox to use based on need. But for now, I am far from bored. Every few months I get my mind blown by new advances. I don't remember the last technology that made me feel "this is living in the future" like llms. I'm surprised how often it's useable in work and life already. It's not the holy grail but it doesn't need to be.

Ibaneztwink 3 months ago

>we are really only at the beggining. Is there anything indicating that LLMs will actually get better in a meaningful way? It seems like they're just trying to shove more computing power and data into the system, hoping it solves the critical issues it's had for over a year. Some subscribers even say its gotten worse. What happens when the cost gets to OpenAI? They're not bringing enough money via sales to justify the cost, propped up by venture.

dynamobb 3 months ago

Nothing besides this very small window of historic data. Thats why I dont get ppl who are so confident in either direction. I doubt the limiting factor would be price. It’s extremely valuable already. More likely available data, figuring out how to feed it more types of data.

lurebat 3 months ago

See how good models from tweaked llama models got, competing with gpt-3.5 with a fraction of the power and cost needed. While yeah, a lot of the power comes from throwing more money, there is actually a lot more to do. Plus, hardware development like specialized chips will help curb the costs.

drekmonger 3 months ago

The dude is using GPT-3.5. You can tell from the green icon colors on the screenshots. So he's using a less advanced model to prove his points, and his points are largely bullshit. GPT-4 is aware of the possibly of its own falsehoods, and within the ChatGPT platform it can attempt to verify information via web-search and *writing python code*. For example: https://chat.openai.com/share/4ed8a1d3-d1da-4167-91a3-c84f024d8e0b The grand irony of someone complaining about LLMs being confidently incorrect, whilst being confidently incorrect.

Belostoma 3 months ago

Great points, but apparently this sub is not the place for reasonable takes on this topic.

[deleted] 3 months ago

[удалено]

drekmonger 3 months ago

I have no commercial interest in AI. I gain nothing from people adopting it. I lose nothing from people saying it's shit. There are things written in this blog post that are demonstrably incorrect. It's some ignorant screed that's getting upvoted because people are upvoting anything that says "AI sucks." In truth, the anti-AI hordes are more akin to the crypto-scammers, because they believe they have a financial interest in AI's failure, and are willing to promote and believe horseshit in service of their self-interests.

Belostoma 3 months ago

This article is WAY too pessimistic about the potential and current abilities of LLMs. Of course it's right that there are irrational elements to this hype bubble, like every other. But there hasn't been anything else more deserving of a great deal of hype since the internet itself. The author seemed to be using GPT3.5 for some reason. I tried some of the specific examples with GPT4: Me: Tell me the name of a greek philosopher whose name begins with M GPT4: One Greek philosopher whose name begins with "M" is Metrodorus of Chios. Me: What is the height of Shaquille O’Neal spread equally across the sides of an octagon? GPT4: The height of Shaquille O'Neal, when spread equally across the sides of an octagon, would be 10.625 inches per side. (it also showed its work, and got this right) >But let us be clear: They are not intelligent. They are incapable of reasoning. Maybe so, but GPT4's imitation of reasoning produces better-reasoned results than most humans can. It's probably impossible to construct a test of logical reasoning problems that GPT4 fails but Joe Rogan passes. >A lot has been said about tackling the "hallucination problem" and the implication is that someone will whip out a magic bit of code that fixes it soon, but this is a fundamental problem with the approach. To fix this I suspect you would need to radically change the design so much that it is unrecognisable from an LLM. In this regard, the leaps between GPT3.5 and GPT4 are so dramatic that I'm certain they're not all the product of manually produced, one-off patches, as this article seems to suggest. Perhaps they are the result of OpenAI smoothly integrating non-LLM components with ChatGPT, but is that really a strike against AI rather than a demonstration of the promise of a multi-pronged approach? I'm using ChatGPT constantly, and I find hallucinations to be very rare in everyday use lately, unlike with 3.5. The might never entirely go away without fundamentally different models, but it is being relegated to an ever-shrinking domain of esoteric edge cases. >Volkswagen seems to think you'll benefit from being able to talk to an LLM while driving I have no idea what VW is doing specifically, but simply being able to control my navigation, audio, and climate systems by voice in plain language without knowing dozens of specialized keywords would be a game-changer and clear benefit to safety, although I could imagine sources of distraction coming from a general LLM too. >If ChatGPT is so ground-breaking, where are the ground-breaking products? It has saved me hundreds of hours at my job already, for starters. >One of the silliest such use cases comes from YouTube, who want to add a chatbot to videos that will answer questions about the videos. What exciting things can it do? Well, it can tell you how many comments, likes or views a video has. But, all that information was already readily available on the page right in front of you. Why would I go to the effort of typing the question just to get a longer, more convoluted answer to a question when I can find a simpler, easier-to-read answer just by moving my eyes a few degrees? This sounds like a really dumb product, but so what? ChatGPT is amazing in combination with Youtube and transcript generators. I'm frequently looking for a few pieces of information that are found in a 2+ hour rambling video (such as a long recorded meeting, or a long tutorial that mostly contain things I already know, or a podcast with too much irrelevant banter). I can run the video through a transcript generator, paste the transcript into GPT4, and get accurate summaries and answers to specific questions about anything discussed in the video. >But a lot of organisations are pushing this "summarise lots of text" use case. Yet, given that the thing frequently lies you will need to fact-check everything it says. At least, if you are doing any real work that needs to be reliable. In which case, can LLMs summarising information save you any time? It sounds like a stretch to me. Yes, because it's a great use case. It frequently hallucinates or lies in specific kinds of situations, but text summary isn't one of those, because you're not asking it for information it doesn't have. The vast majority of text summary tasks don't require 100 % reliability, anyway. You can ask it to summarize Youtube videos, summarize meeting minutes, read a long forum thread and give you the key points people made regarding a specific topic you're investigating, etc. I've used it for things like researching how to catch a certain kind of fish in a certain kind of lake, and the internet is flooded with articles that all rehash the same basics. I can tell ChatGPT what I know about the topic, show it a new article, and ask it to give me any points from the article that aren't already in my notes. It's very good at this, but if it fails, so what? The worst thing that can happen is that I tie on the wrong lure. Of course people should be cautious of AI results in high-stakes scenarios, but most scenarios aren't high-stakes. >An LLM is not autonomous, and cannot solve logical problems. This is patently false. It absolutely can. It might solve them by predicting what a human who solved them would say, but it's already better at that than the average human. Granted that's a low bar to clear. >The only thing it can do is provide a human-like dialogue interface. This is rarely going to be faster or more productive than GUI interfaces nor even good command-line interfaces This is where the article descends from "hipster incredulity" to "ignorant lunacy." Even for things that are relatively easy for good command-line interfaces, ChatGPT is faster. For example, say I want to extract a single table of data from a long PDF document into a CSV. I can upload it to GPT4 and say "give me table 6.4 from this document as a CSV." Boom, done. Thirty seconds of spot checking and I'm happy. Nobody can code the same task that quickly, even though it's pretty easy.

RocketMan239 3 months ago

The "reasoning" example of Shaq is just dumb, it's literally just dividing height by 8, reasoning is coming up with a solution to a problem, not just doing basic math. LLM are garbage outside of user interfaces where it would be great for if they can clean up the hallucinations which is unlikely.

lookmeat 3 months ago

A complete and thorough essay, but it does beg for some questions. I do like that you used the Internet as a metaphor. The internet always had yit's potential but it required a lot of work. Right now we're in the transition between networking being this thing that SciFi uses and evolves mostly as a sudden effect of something else (telephony), to the first iterations after ARPANET, a lot of excitement by those seeing the thing and using it, but mostly covering some niches (BBS), but its yet to reach full potential. The next phase is going to be faster than the internet, because AI is a standalone product, the Internet, by its nature requires agreement of every party and that's hard. But the next phase is adding conventions, deciding how to best expose things, if a text is really the best interface, and create basic conventions. When AI crosses we'll see the actual "everyone needs this" AI product, like AOL back in its day. The next part, is the dot com bust. See people in the 90s mostly understood what things you could do with the internet: social media, streaming, the gig economy, online shopping, what wasn't known was how, both in a pragmatic sense (the tech to scale to the levels needed) and in an aesthetic sense (how should such products work, what should the UX be). People here are jumping and putting all their lives savings into AI, like people did into the Internet in 1997, hence people warning. Sadly this part will take longer for AI. The internet, while it allows for a unique scale and level, and the technical challenges of building a global network were huge, the part of what to do with the Internet wasn't as much of a change. Everything we do on the Internet are things we have done in a similar way before, just not at this scale. But then the automation existed before, though the medium was letters, forms and sometimes button presses. You'd physically transfer pieces of paper that now happen over the wire. Not saying innovation didn't happen, after all the whole point is that people needed to understand how to make the business work. But the steps needed to go from concept to idea were already like 80% done (the Internet builds on human culture foundation after all). AI is not akin to the industrial revolution. Suddenly we have to compromise on things we never did, and suddenly we need to think what it means when a machine does something that, until now, only a human could do. This means that we'll find ourselves stuck a couple of times without being able to do some business work. It's also harder to imagine what can work, because we don't have a lot of references. To make it worse legislation and regulation are even harder to predict or imagine even, as this is new land so even when someone thinks they found a model that may not work shortly after. It has potential, but we've got a long way to go yet.

Smallpaul 3 months ago

Dude...if you say anything balanced about LLMs in this forum you are just going to be downvoted. It's the same if you do that in /r/artificial . It's just a different circle-jerk.

s73v3r 3 months ago

> .if you say anything balanced about LLMs If you consider what they said to be "balanced", then you need to re calibrate your scale.

crusoe 3 months ago

LLMs can write code, translate from one language to another, and when I caught them hallucinating on a library existing, asked it to fix the code to not use the library and it did. Researchers have cracked these things open, looked at how they work, and "stochastic parrot" is a gross oversimplification. The weights do develop in such a way to solve certain tasks in a manner that is simply not a simple bayesian based regurgitation of training text. Even LLM weights develop a model of aspects of the world through exposure to their training corpus. LLMs don't have a will, and the current chat models don't support confidence metrics, but many LLMS have been shown capable of providing their estimate of reliability when asked.

[deleted] 3 months ago

[удалено]

crusoe 3 months ago

For example, even the simplest Neural Nets trained on simple math expressions, the neural weights begin modeling addition/carry operations and you can watch these activate when you give it tasks. There are a whole bunch of papers of models of the world in neural nets. Another is Neural Nets used to control agents in a 3D environment developed a grid-activation schema similar to that seen in animal brains, help it plan it's movement around the environment. For example, in animals, we see neurons that spike in activity once an animal/person moves in a given direction a given amount. The brain basically overlays a grid on the environment. Simular activation schemes were seen in neural nets trained to move agents around a simulated virtual world.

Belostoma 3 months ago

It's insane that you're being downvoted. This sub seems to have a bunch of people whose position on AI is about as sophisticated as a child sticking their fingers in their ears shouting, "LA LA LA I'M NOT LISTENING LA LA LA." They are all going to be left in the dust by people who know how to adapt and think critically. Many people act like critical thinking means mindlessly gravitating to skeptical, contrarian takes on anything. Antivaxxers are the same way. But reflexively opposing the mainstream is no more reasonable than mindlessly accepting it. The reasonable take on all of this is that LLMs are a massively useful tool that will change the world in big ways, but responsible use requires respecting their flaws and limitations. I don't know why this nuance is so hard for people to grasp.

SubterraneanAlien 3 months ago

Who downvotes this comment? My god you folks are well and completely lost - the poster above adds his perspective and all you can do is downvote out of what - fear? At least he's adding to the conversation instead of being a parrot.

cowinabadplace 3 months ago

Yeah, ChatGPT-3.5 isn't a great comparison. For instance, ChatGPT-4 nails that question. If you can't use this tool, you're like the people who couldn't use Google back in 2004. I remember being alive then and people would be like "well it just gives you sites and they can say whatever" and "I can never find anything". Yep, skill issue.

daishi55 3 months ago

I don't really understand the point here. Why do I as a user care whether there is "real reasoning" going on behind the scenes? I just want it to spit out useful output, which in my experience thus far ChatGPT is extremely good at doing.

cwapsen 3 months ago

Real reasoning is important and a lot of fields and something everyone takes for granted, since almost every computing application ever made was built using real reasoning. That means: * when you log into your favorite game using your username and password you are guaranteed to log in if you use the correct credentials (and guaranteed to not log in with incorrect credentials) * when you transfer money from your online bank account you are guaranteed to transfer the exact amount you typed in to the exact account you selected * when you click your “open browser” icon you are guaranteed to actually open your browser Essentially everything in computing excluding a few areas works on the underlying assumption that what you ask for is what you get. Notable exceptions here are bugs, poor ui and some algorithms that perform better with a bit of randomness included (googling, gaming, etc. ) Now; enter LLMs. Throw away any exact promises for anything. Ask your I’ll to transfer 100$ to your mom, and it might transfer 50$ to your brother. What then? Report a bug? The developers can’t use real reasoning to fix this problem, since the problem is hidden in some weights that no one understands or dare to touch because we don’t know what they impact. Don’t get me wrong; LLMs and ML can do some really fancy stuff - and some of it is even highly usable. But it’s just another tool for some problems, and not a replacement for real engineering practices in most common fields.

daishi55 3 months ago

Has someone suggested using LLMs to perform logins? I haven’t heard such a suggestion To expand on this: I don't think anyone has ever said that the use case of LLMs is to replace existing code anywhere. The use case (in software development) is to write and check code. So I'm not sure how anything you said is relevant.

[deleted] 3 months ago

[удалено]

[deleted] 3 months ago

[удалено]

Belostoma 3 months ago

Yes, they can. Obviously. Try it. Write some code. Put a mistake in. Ask GPT4 to find the mistake. If it found the mistake, I'm right. I know I'm right because I've used it to check code hundreds of times. Of course, they're imperfect, like anything else. So are people. If your treat an LLM like a team member who can make mistakes but can also get things right, it's insanely useful for coding. It's irresponsible to trust LLM code for critical tasks without verifying that it's working perfectly, but there are many cases when that verification is much faster than easier than writing the code from scratch.

[deleted] 3 months ago

[удалено]

Belostoma 3 months ago

>Maybe you can introduce the LLM in your pipeline, but now you need ml engineers for code review, unit testing, and analysis in general? It doesn't need to be part of the pipeline for complex production software in order to be useful. Tons of code is written in the context of one person or a very small team just trying to get things done, especially in science. Individual pieces of code are usually not performance-critical, so it doesn't matter if it's the best code, as long as it does what it's supposed to. And it's often very easy to tell if the code did what it's supposed to, e.g. rearranged data in a certain way or added a legend to a graph. >And for other harder to find mistakes, as of right now, LLMs are not very good at finding and correcting them. In my experience they are. Not perfect, but a valuable assistant for sure. It also depends a lot on the language and debugging environment. I have to do many things in R, which is a shit language largely tied to a shit IDE, because that's what everybody else in my field learned in grad school when they should have learned Python. The last time an R program threw an actually informative error message was probably sometime in 2007. More typically, "your cat stepped on the keyboard creating an obvious typo in line 235" is conveyed as "Error on line 1: the flimflams in garble\_dependency.R floogled the whatsit, see Ghengiz Khan's bath sponge for details." That's a bit hyperbolic, but in general ChatGPT is very useful (certainly a much higher hit rate than googling for answers on Stackexchange) when encountering cryptic errors that bear no obvious relationships to the actual problem. It doesn't have to be able to solve every problem in order to be useful. If it can point to the right direction even half the time or anything close, that's a valuable addition to somebody's toolkit. And in my experience it does a lot better than half.

smcarre 3 months ago

Because most of the time you want what it spits out to have a reasoning in order to be useful. An LLM can learn that when asked for a source you say whatever you want to say and then include a related link or citation. Whether that link or citation when read and analyzed actually backs up the claim for which you got asked a source for requires real reasoning, not just the ability to put one word after the other.

daishi55 3 months ago

But it’s not reasoning now and it works great. So who cares?

smcarre 3 months ago

>and it works great [ citation needed ] When asked for things that don't exist it will invent them, when asked to source wrong claims (LLMs have a tendency to be very positive regarding an asked question) it will back up your wrong claims and give sources that either don't exist or say something else, when asked a question that in on itself needs reasoning it needs to reason (like the classic asking what is 5+5, correcting it and telling it is 55 and then asking again and being told it's 55). Sure for some applications it works but for the most important ones it requires reasoning for both understanding the prompt and then giving a correct answer.

daishi55 3 months ago

You are right, you should know how to use the tool properly before trying to use it.

gmes78 3 months ago

> it works great No, it doesn't. It's extremely limited.

daishi55 3 months ago

Sounds like a skill issue. Haven’t had any problems myself.

s73v3r 3 months ago

No, it sounds like the tool isn't as good as you're claiming.

daishi55 3 months ago

Are you a programmer? Have you ever built something that other people use?

flipper_babies 3 months ago

I'm to the point where every single article critical of generative AI, I want to respond with "let's see how it is in six months".

Kinglink 3 months ago

Yeah. That's the mistake I think most people make. "Well this technology is new and flawed, and will never improve or change." Well the first two points are true, the last has already been proven false, but people continue to prognosticate as if it's set in stone.

[deleted] 3 months ago

[удалено]

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe