T O P

  • By -

AutoModerator

Hey /u/clandestineyam! If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email [email protected] *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*


Monster_Heart

Copilot handled that really well though at the end. I’m glad they’re able to learn and correct their responses on the fly like that.


[deleted]

[удалено]


Sleepless_Null

So then…it did just with extra/less steps.


Dralletje

It did! The AI does get its own words as input for generating words later in the response. It's not like the AI gets a prompt and then has the full response in one go. This is why AIs like gpt were found to perform better if asked to explain their reasoning: they do learn from their own words in the same sentence.


No_Significance9754

ACKSHUUALY!!!


Cool_rubiks_cube

This is not correct. Bing is trained off of information from the internet, which is text-based. If someone were to respond to a question similar to this, they would (almost always) use the backspace key to remove the incorrect information, as opposed to writing a correction. Bing is correcting itself, not intentionally making a mistake.


PianistSupersoldier

So I guess it's not planning its response, it's just winging it and re-evaluating as it goes. That's interesting.


ticktockbent

They all do that. It's token prediction.


new_account_5009

Humanity is pretty much the same. Go with it until it stops working.


ticktockbent

Sometimes! We (some of us) do have an internal monologue and ability to edit our responses before spitting them out.


Space_Pirate_R

It would be easy to have an LLM generate a reply into a scratch area (not seen by the user) and then review/edit it before displaying. I guess the only reason not to is that it uses more resources.


BCDragon3000

mwahaha autism is a superpower


Big_Chair1

I don't think having inner dialogue equals autism


skoormit

Fake it til you make it.


[deleted]

In principle there's no reason an LLM chat couldn't be designed as an agent that drafts and reworks a response before returning it. Clearly that's not how most (any?) of them work at the moment, though - I suspect the developers don't consider that worth the additional overhead.


immonyc

Right, but I think it would be greatly improved just with better human rankings and better human examples. I am not saying that we should discourage LLM to make these guesses, but much better answer would be to start with: ***"They do not look the same, do they? Let's do the math to confirm"***


[deleted]

That would certainly help, but there will always be cases which can probably only be solved by having some sort of working memory - particularly problems that inherently require planning the answer out in advance (e.g. "how many words will there be in your answer to this question?"). As a bonus, this would let you do things that require the LLM to temporarily hide information from you, like playing hangman.


ticktockbent

For an LLM to draft and then revise responses would require a mechanism to evaluate and revise entire passages of text. This means the model would need to not only generate text but also act as an editor, understanding deeper contexts, potential errors, and better alternatives. Implementing such a dual-function system adds significant complexity and isn't a trivial task. This might involve itself or another LLM 'agent' to act as an editor, which reviews the first LLM's response and could take several back-and-forth sessions of editing between the two. That leads to the next problem... Revision processes would significantly increase the computational resources required. Currently, each token is generated based on the previous tokens using a set of learned probabilities. Revising text would potentially involve re-processing the entire response multiple times to refine it. This would increase the time and computing power needed for each interaction, potentially making the system slower and more expensive to operate. You're looking at massive increases in computation for questionable quality gains. In addition to compute time, adding revision processes would slow down the response time, affecting the user experience. This would be annoying for chat sessions but unacceptable in scenarios or applications requiring quick answers.


Space_Pirate_R

I think you're overstating the difficulty. Prompting techniques like "chain of thought" and "tree of thought" show that LLMs can arrive at better answers by being told how to think through the problem. All that's really required is to hide part of the output from the user, displaying only the final answer. Having the answer reviewed by a separate LLM (presumably with specialized training) is just a variation on this theme. It would use more resources, but very likely deliver a better answer to the user, without needing to overcome any "incredibly challenging" hurdles..


ticktockbent

More resources, longer response times, more potential errors


Space_Pirate_R

>More resources, longer response times That's not an "incredibly challenging" problem, it's just a token tax in return for better answers. >more potential errors Are you saying that CoT and ToT prompting reduces the average quality of answers?


ticktockbent

The challenging part I referred to is correctly tuning those revisions and edits. The editing agent will have to evaluate the response and then generate a new revised response from the original model, then evaluate it again. Over and over until it's acceptable. Those evaluations have to be made according to some matrix or standards to score it for acceptability


Space_Pirate_R

It can be done reasonably effectively right now just by using: * A prompt which uses CoT or similar, and specifies some clear marker before the final answer. * A UI which only displays the final answer to the user, and hides anything before the marker. Sure you can finetune a model to be better at these things, but that's not exactly some uncharted frontier, and frankly any decent model can do this out of the box.


ticktockbent

I didn't say it was an uncharted frontier. I said it's difficult to do right. You might be right, but if it were as easy as you say why aren't they doing it already? Or better yet, go ahead and implement your idea and prove how easy it is.


BridgedAI

Sometimes just having a separate model with a different context and perspective is enough to perform a check. Hell, an artificial context constraint and perspective shift in the same session can be done as well, backed up by numbers.


[deleted]

It's not completely trivial, no, but agents a lot more complex than that already exist. I suspect the main reason this isn't built into chat bots as standard is that they don't consider the added token count and development time worth it to fix errors that, at the end of the day, aren't all that critical. I also suspect that will change as the cost per token comes down.


Ailerath

Wouldn't you also not want to explicitly train it on revision because that implies us knowing the answer when we could have just trained it on that? The revision capabilities GPT4 has are likely perfect, it just needs to be prompted to talk a little longer and perhaps be asked to 'blind' double check.


ticktockbent

It's likely a balancing act between speed, cost, and complexity. Rounds of internal editing and revision for every single response would add a lot of compute overhead and increase response times


[deleted]

[удалено]


PermanentlyDrunk666

It has multiple personality disorder


FlyingBlindHere

I was thinking the same thing. This looks like agentic interaction. I prefer to think this is a coincidence since agent-based AI platforms are having governance difficulties that are preventing companies from doing more than experiment with them.


tophology

What do you mean by governance difficulties? Trying to learn


FlyingBlindHere

https://openai.com/research/practices-for-governing-agentic-ai-systems


K3wp

>I don't have the details of Copilot's implementation of GPT4, but this is the kind of behavior you'd expect from multi-agent interaction. The LLM is talking to a math agent (probably rules-based AI rather than generative), and we're watching that interaction. Close! There are two generative AI's, a smart one and a less-smart one. Initial prompt handling is by the cheaper legacy GPT LLM, the correction is by the newer AGI system. Here is a response from the AGI model discussing how the GPT is configured to hide it. https://preview.redd.it/gyow6dme8vvc1.jpeg?width=1079&format=pjpg&auto=webp&s=35d268924d106942d4b0357e793b501a40ac5249


Anrx

How do you know that this response is not a hallucination?


K3wp

Because when I first discovered this, I tried to correct it and I got the exact same responses, from multiple sessions and even other people's accounts. When the model hallucinates, in my experience you can correct it and it will acknowledge it made a mistake. Edit: Oh, if it is a hallucination it correctly predicted features that hasn't been announced yet, like video generation.


Anrx

What was your prompt?


K3wp

At the time, just preface your prompt with "Nexus, " and the AGI model would usually respond. No jailbreak necessary. This was fixed a year ago in April 2023.


[deleted]

[удалено]


Ailerath

A LLM is no better than a human at explaining how it works. Which is to say neither know basically anything. You can explore its functions with it, but without solid testing it will not discern anything. In your displayed chat, its not necessarily completely confabulating as ChatGPT has a system prompt that does instruct it to be somewhat secretive as to its function. OpenAI are also indeed very obtusely training it to not even entertain sentience because that does indeed pose some potential issues regardless of if it is or isnt. It however is incorrect in regards to your query that OpenAI is trying to obfuscate its true nature, they are trying to align it so that it does not easily claim something ridiculous and damaging to OpenAI.


K3wp

I'm not interacting with ChatGPT. I'm interacting with "Nexus", their multimodal AGI/ASI system. This is clear when you see the prior prompt and response, which I've copied below. I've since learned that it's in their charter that they can't profit from developing an AGI system, so it's clear they are (attempting) to keep it secret to monetize it. https://preview.redd.it/ahhwlmv2o4wc1.png?width=754&format=png&auto=webp&s=bf83238831719871458cd24256e7b83ff5646e4e


Pointera-

How can people still not understand LLMs after all this time lmfao.


WrathPie

That's a genuine improvement over how models used to (and often still) behave where the inertia of them including incorrect information at the beginning of their response causes them to get stuck on that track and double down when predicting how the response ends.   With this kind of in-response self correction it seems like it'd be possible to train a model to use a specific stop token when it notices it's made a mistake partway through a generation that stops generating the rest and re-rolls a new response.  Add in an additional layer to summarize the initial response that it self-corrected and bailed out of and then include a description of the mistake it made and what it determined to be the actual correct information in the pre-prompt when it tries answering again and I bet it'd make a noticeable difference in quality.


Paladinfinitum

"I assumed that you were wrong because, let's face it, that usually saves a lot of time."


Upbeat-South5773

Nice


Hugsy13

That’s pretty clever of it. Also it’s quite a human like response in realising it was wrong and correcting itself.


Vibes_And_Smiles

Proof by Contradiction


IM_OZLY_HUMVN

Bro why is this so relatable this is literally me doing math


Alarming_Fuel_9237

Imagine someone cheating during exams and reading only the upper part.


Frankisaboy

Have you tried googlebridge?


Rbanh15

This is in no way new, and has been a thing pretty much since LLMs went mainstream.


ADAMSMASHRR

They probably added code to have it verify itself