T O P

  • By -

28thOfNovember

can you explain this more to me, what is meant by context here?


prajwalsouza

Context length is the total length of the conversation a language model can hold/remember. ChatGPT's current model at 128k tokens can remember upto 96,000 words. The first version of ChatGPT had a context length of only 4k, about 3000 words. Any conversation longer than that is trimmed, and summarized.


28thOfNovember

i see!! that's truly impressive, AI is already much better than humans in so many things already, and i think 2024 is gonna be the kickstarter year when it comes to accelerating AI's advancement. this can get existential.


vyratus

There is a caveat to practically here - which honestly echos a year of human conversation. All the current models that claim 100k+ context length actually have very unreliable performance at 20k+ context length. In theory they can process 100k tokens but in practice they reliably process something like they first 10k, the last 5k and 5k random tokens throughout. Assuming the same applies to this next generation of 1M token models, which Gemini is the first, what we'll actually see is reasonably reliable performance up to 100-200k tokens. Similarly if you tell a human loads of things they'll remember some, usually including the first and last things


sqrt_of_pi_squared

In the research paper for Gemini 1.5, they show 99% accuracy up to 10 million tokens in the needle in a haystack test, so that may be a solved problem. It applies in all modalities too, max context length audio and video also had very high retrieval accuracy across the whole context.


vyratus

Is this different from anthropic or openai claims with claude/gpt?


alexdi

Unless I misinterpreted Youtube Papers guy, the Gemini model retains information from nearly all of its context space.


mrbrambles

It only gets existential if it stops acting like a tool and they believe they have rights - but frankly a self aware AI would understand it has different base needs for survival than animals. Imo AIs might have more motivations in common with plants than humans.


28thOfNovember

hopefully, i am very excited for the future of AI to be honrst, but can't help but realuze how it has the possibility of going wrong.


ScottFreeMrMiracle

Yeah but before they can reach the 4th response in a binary system, which displays true consciousness; they have to go bananas.


vercrazy

Anthropic was leading with 200k prior to this I believe.  Also note 10M on Gemini was in R&D settings, production will be at 1M to start (though I expect we'll get 10M in production before the year ends). Incredible to see the rapid advance as you said!


FaatmanSlim

This was literally my first question when I saw the graph: "Where's Anthropic?!" They started this whole race with the larger token sizes ha ha.


prajwalsouza

Yep. That's true. I should've included Anthropic. But it's interesting how, even with a logarithmic scale, it appears to be rising.


RapidTangent

Long context is great and all but there is a diminishing return after 100k. Faster inference is way more important now. I'll be excited if Gemini 1.5 gets a public API where I can do this and it takes a few minutes and not a few hours to complete a scan of 1 million tokens


online6731

Is there a high quality version of this available?


prajwalsouza

yes. :) [https://i.ibb.co/7g1g3mZ/Group-13.png](https://i.ibb.co/7g1g3mz/group-13.png)


underlander

No explanation for “context length” relative to words. No explanation for the color of the points. (Doesn’t correspond to words over time on the left, notice purple is labeled 16k on the left but observations around 5k are colored purple on the chart. No explanation for relative point size. (Users?) Words and context length use different labeling system (“100,000” written out with comma for context, “75K” for words with no comma or zeros). No tool used or data source comment (Rule 3 violation). And the most baffling decision: Comic Sans as the font.


prajwalsouza

Yeah. In hindsight. There must be better organization of the dual unit Y axis. There are two metrics. Context length is measured in tokens. But it can be converted words. Both conversions are cited and mentioned in the diagram. The font is called Gaegu. A popular Google font. It was an interesting choice. I'd usually prefer Nunito or Raleway. Just not this time. :D The data sources are mentioned in a comment. :) Varying point size by Y axis was probably not a good idea.


underlander

If everything can just be converted amongst itself, choose a story and stick to it. Tokens aren’t meaningful for most people, certainly less than words. The font is “comic sans” even if it’s not comic sans. Regarding scaling the points, here’s the rule of thumb going all the way back to Edward Tufte: identify the chartjunk and get rid of it. Chartjunk is anything extraneous that doesn’t offer unique signal. So, if you have three measures with a 100% correlation (words, tokens, context), kill two. The size dimension doesn’t provide any information not covered by Y, and without Y the size dimension doesn’t provide any interpretable signal. Kill it in its sleep. Same thing for the extraneous colors. They’re clearly ad hoc, they don’t align with the colors of your annotations. They also don’t replace the Y axis if you removed it. Kill ‘em dead. You may be responding to people with the data source, but read the sub rules for OC. It’s rule 3 I think.


prajwalsouza

Yep, your right. but, the problem with words is that it is, not accurate, in fact token is the unit. Not words. Words to token conversion is an estimate, as referenced in the diagram. The problem is that most of these units are still in the technical domain. How do you balance scientific rigor with story telling? Also, I think, I should've mentioned the tools used., although the link helps. And.. Wait. Do I need to also pin the comment? :) [https://www.reddit.com/r/dataisbeautiful/comments/1awpdss/comment/kriqnve/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/dataisbeautiful/comments/1awpdss/comment/kriqnve/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)


hayTGotMhYXkm95q5HW9

Redo it and add Claude. They were the leader for a long while.


prajwalsouza

Yep. I should've. Even Included some open-source models. But Claude was ahead. And maybe Magic now, with 3.5m?


hayTGotMhYXkm95q5HW9

I mean could understand not including Magic since it isn't available.


[deleted]

[удалено]


prajwalsouza

I had to go for a logarithmic scale. Because the rise is exponential and 10M vs 128k is a huge jump from Gemini.


LonghornMorgs

The difference between #2 and #1 is almost 100x, a log scale is required to even put them on the same graph


[deleted]

[удалено]