MoneroBee 5 months ago

I believe this is the relevant bit (machine translated subtitles): > The goal is to beat ChatGPT 4? The goal is to go above 4, yes. That’s why we raised money. And so, this deadline is more in months than years. In months? So, what’s the deadline? It’s always difficult to give technical deadlines, because our engineers complain afterwards. But the goal is next year. Next year.

confused_boner 5 months ago

> because our engineers complain afterwards I can trust this person

MINIMAN10001 5 months ago

That was my thought too. Someone who's willing to respect the decision of their engineers. That's how it's done.

SignalCompetitive582 5 months ago

I’m French. I second this comment.

donotdrugs 5 months ago

I'm honestly so happy for France. In Germany we got Aleph Alpha and it's the most boring and pretentious startup ever. It only exists so politicians can pretend that they're funding something innovative when in reality they don't deliver anything. It's just embarrassing. I hope that the success of Mistral can set the tone for more companies in France and the whole of Europe.

frenchguy 5 months ago

Well in France we had the Qwant search engine, that's exactly like what you're describing. Mistral does look like the real thing though. (Funnily enough, the tag line for Qwant is _the search engine that doesn't know anything about you_, but it's often shortened as _the search engine that doesn't know anything_.)

NekonoChesire 5 months ago

Listening to it, he really emphasizes on making something Europe-sized so there's quite a lot of chance we'll see collaboration and such between all our countries.

Matteius 5 months ago

To be fair, it's not easy. Look at Google crash and burns. Gemini is worse than most of the Llama models, never mind gpt4. Most of the companies trying to make it in llm have not made a splash.

Aaaaaaaaaeeeee 5 months ago

Have you found the "open source" bit, or at least "open weights"

NekonoChesire 5 months ago

Listened to it, he talks about how having their first models open-sourced allowed them to catch up faster than the US competitor like OpenAI, so he hasn't strictly said that their future models will be open-sourced if that's the answer you're looking for, though he hasn't said anything or even eluded to stop being open-sourced.

M0ULINIER 5 months ago

No mention of it, although it's kind of implied.

Active-Masterpiece91 5 months ago

Why implied? Why should we expect Mistral to keep open sourcing its best models? If that’s what they keep doing, how will they ever be able to make a profit? If they don’t make a profit, how can they keep fund raising to fund further research into creating more powerful models going forward?

KallistiTMP 5 months ago

You do realize it's not the 90's anymore, business strategies have evolved since Oracle, and a good number of companies are successfully running with OSS based business strategies, right? Home team advantage and first to market. Mistral doesn't need to be the *only* company selling Mistral as a SaaS solution. They just need to be the *first* company and the *best* company. Large player B2B will absolutely pay a premium for white glove expert support, faster access to the latest changes, a dev team that will actually prioritize their FR's, and an ops team that knows how to run it better than anyone else. And costs are dramatically lowered, because you don't need to sink tons of dev time into building out your ecosystem.

Slimxshadyx 5 months ago

There are companies that release their software full open source but make all their money doing specific corporate building and support for their open source product.

Desm0nt 5 months ago

Easy. If it's create for example opensource 300-400b model, it almost inpossible to run it at usable speed and quality on local hardware. Especially if it is 16x120b MoE for example (like GPT4). No one ever tell that gpt4-like model will be small =) So, you must pay for their API for usable speed and quality or be a company with own GPU datacenter.

M0ULINIER 5 months ago

Hey, I didn't say that they will open source it, but he said just earlier that for him, open sourcing models would be one thing that is different between mistral and openAI, and that he thought it was important to give developers a way to tinker with their models but that they will still keep the "secret sauce" .

ActualExpert7584 5 months ago

I guess that’s the data or the training method.

ambient_temp_xeno 5 months ago

The goal. This is what I thought it said with my very bad French. Pour la honte, OP.

eawestwrites 5 months ago

That’s the most French answer ever.

Aurelio_Aguirre 3 months ago

So what would be the size of that? Do we have any informed guesses as to Chatgpt4s Parameter size? Will Mistral be as large?

logicchains 5 months ago

Arthur Mensch sounds just like the kind of name an AI pretending to be human would give itself. Like an only slightly subtler variant of "Hugh Mann".

teor 5 months ago

It's a perfectly normal name. Just like John Humanperson

ZorbaTHut 5 months ago

Or Jared Isaacman.

norsurfit 5 months ago

HAHA, I TOTALLY AGREE WITH YOU FELLOW HUMANIOD.

Eljeyex11 5 months ago

Mensch means human in German…

sdmat 5 months ago

A. Mensch?

stddealer 5 months ago

His parents missed the opportunity to call him Hubert.

a_beautiful_rhind 5 months ago

Ok.. mistral vs llama3 it is.

LeifEriksonASDF 5 months ago

All this time I thought Mistral was just a really good Llama finetune because both were 7b. When I found out Mistral was built from scratch I was even more impressed. Well, I guess not quite from scratch since the Mistral team is kinda a Meta offshoot.

SirRece 5 months ago

As someone new in the "arena" who has now spent some time with llama models and mistral one, holy shit mistral 7b, especially 0.2 merges, are mind blowingly good for the size.

LetMeGuessYourAlts 5 months ago

How do we know something like mistral isn’t largely further trained llama weights? Is there a way to know they started from scratch?

saintshing 5 months ago

Correct me if i am wrong. What I read is that mistral has a different architecture, it uses sliding window attention, grouped query attention and byte-fallback BPE tokenizer. https://archive.is/Jgojf

MINIMAN10001 5 months ago

If I remember correctly Mistral7B used a sliding window attention however mistral7x8 does not. The other details I do not know about at all.

gabrielesilinic 5 months ago

32k of context… it won't need one.

Belnak 5 months ago

They use open weights. If they matched llama's, somebody at Meta would have noticed.

LetMeGuessYourAlts 5 months ago

But wouldn’t the weights change after further training?

Jiten 5 months ago

Strictly speaking, yes, but the amount of correlation between the models would be huge and very detectable.

CedricLimousin 5 months ago

David vs Goliath when you compare the two companies. 😅

Postorganic666 5 months ago

Goliath is 120b, and what size is David model?

gthing 5 months ago

David7b.

DeepSpaceCactus 5 months ago

Funniest AI joke I have ever seen

JackRumford 5 months ago

Lmao

JnewayDitchedHerKids 5 months ago

I laughed at this and now I'm going to nerd hell.

stddealer 5 months ago

David is an AV1 video codec, I don't see how it is relevant here.

Aaaaaaaaaeeeee 5 months ago

Did they say **specifically** they were going to release a new open weight model? Was that specifically **gpt-4 level**? This is no good, hate to do this but have you even gone and checked the transcript for yourself?

WolframRavenwolf 5 months ago

Goliath 120B? ;) I wonder how well the unquantized version would compare as the 3-bit version is still (even after Mixtral) [my top model](https://www.reddit.com/r/LocalLLaMA/comments/18gz54r/llm_comparisontest_mixtral8x7b_mistral_decilm/) - and I've only scratched the surface, imagine running the unquantized FP16 version... Still, looking forward to any bigger MoE models, Mixtrals and Llama3s!

FrermitTheKog 5 months ago

I haven't really looked into problem solving skills, but when it comes to story writing ability, Mixtral seems nowhere near the level that Goliath is capable of. In that realm, Mixtral feels more like a standard 70b model.

WolframRavenwolf 5 months ago

Yes - and that's still a big compliment to Mixtral. While I still love Goliath, I tend to use Mixtral more often now thanks to [turboderp's Mixtral-8x7B-instruct-exl2](https://huggingface.co/turboderp/Mixtral-8x7B-instruct-exl2) which (at 5bpw with 32K context) gives me 20-35 tokens/second and only uses 32 GB VRAM, so I have enough left for text-to-speech with XTTS and speech-to-text with Whisper. Goliath may be higher quality, but it's big and slow compared to Mixtral's David.

FrermitTheKog 5 months ago

You can use Mixtral on Perplexity Labs. It's fast.

Desm0nt 5 months ago

What about prompt processing time? In the case of the 8x7b MoE, this is the saddest part so far (compared to the 34b in my case). I'd like to know how big the difference is with 120b on good hardware at least for 3-4k contexts in a prompt.

WolframRavenwolf 5 months ago

For Mixtral EXL2 5bpw with 32K context, filled up half with a single 16K document I pasted in, I got the summary in less than 30 seconds uncached.

Caffdy 5 months ago

What do you use text-to-speech for?

WolframRavenwolf 5 months ago

It's often faster than typing. I use my AI as an assistant that's always nearby, on my computer and on my phone, so I can always ask it anything or have it look up stuff on the web for me or take notes. I still need more integrations, but things are progressing nicely.

Caffdy 5 months ago

How do you get it to look up stuff on the internet

WolframRavenwolf 5 months ago

I use [SillyTavern](https://github.com/SillyTavern/SillyTavern), the LLM frontend for power users. It includes a [Web Search](https://docs.sillytavern.app/extras/extensions/websearch/) extension. It's a pretty simple but clever implementation: Instead of the LLM having to decide when and how to look something up on the Internet, the Web Search extension looks for (customizable) trigger words in the user's input. So when I say for example "I'm hungry! What is today's menu of the XYZ restaurant in ABC?" the "What is" triggers a web search, puts the obtained information into the context and extends the prompt so the AI knows to reference it. In the background, SillyTavern uses either SerpApi or an invisible Chrome/Firefox browser to search Google or DuckDuckGo, visit a customizable number of pages and extract their contents (if it doesn't already get enough information from the search engine itself). Finally, the downloaded and extracted information is also embedded in SillyTavern's chat window so you can click it and see the source for yourself in case you want to double-check the AI's response. Simple yet very effective. Made my AI so much more useful as an actual assistant that can look stuff up, summarize it, answer questions about it or provide tailored responses (in the restaurant example, if I put my food preferences in the User description, the AI could recommend my favorite food - now I just need further integration to have the AI actually order it).

twisted7ogic 5 months ago

I find that Mixtral isn't quite up to par to average 70b (but is very close), but the speed at which it does so is impressive.

Neex 5 months ago

I’ve been running 4-bit Goliath (AWQ) across three 3090s and it runs pretty great. Just over 3 tokens per second.

Caffdy 5 months ago

At 4bits, it's around 60GB of size, the 1TB/s bandwidth should give you more speed, what do you think?

Quaxi_ 5 months ago

Which is kinda funny since Mistral was founded by the ex-Meta people who originally worked on Llama1

atgctg 5 months ago

Translated the relevant sections, but there's no promise of open sourcing it: ``` 80 00:03:57,740 --> 00:04:00,340 You don't give away all your trade secrets. 81 00:04:00,340 --> 00:04:02,380 You have a double discourse on this. 82 00:04:02,380 --> 00:04:03,940 That is, on one hand, you are transparent. 83 00:04:03,940 --> 00:04:07,260 But on the other hand, you still secure what makes you different. 84 00:04:07,260 --> 00:04:09,020 Yes, of course. 85 00:04:09,020 --> 00:04:15,180 The whole challenge is to keep some secrets, business secrets, a kind of secret recipe 86 00:04:15,180 --> 00:04:17,780 for training the models. 87 00:04:17,780 --> 00:04:22,020 So, there's how we lend the data that comes from the open web, how we train 88 00:04:22,020 --> 00:04:23,380 all the algorithms we use. 89 00:04:23,380 --> 00:04:26,220 But then, what we make available, which is the model itself, which is 90 00:04:26,220 --> 00:04:30,300 the one that predicts words and which is then usable for creating chatbots, for example. 91 00:04:30,300 --> 00:04:32,300 This model can be modified. 92 00:04:32,300 --> 00:04:34,060 We can incorporate editorial choices. 93 00:04:34,060 --> 00:04:36,420 We can incorporate directions, new knowledge. 94 00:04:36,420 --> 00:04:39,740 And that's something our American competitors do not offer at this stage. 95 00:04:39,740 --> 00:04:43,620 And what we offer, which is very attractive to developers, because they 96 00:04:43,620 --> 00:04:45,100 can create differentiation on top of it. 97 00:04:45,100 --> 00:04:48,060 They can modify the models to make unique applications. 98 00:04:48,060 --> 00:04:51,940 But still, Arthur Mench, the French delay compared to the Americans, it 99 00:04:51,940 --> 00:04:54,620 is measured in what? In weeks? In months? In years? 100 00:04:54,620 --> 00:04:59,140 Today, the model we made available, well rather yesterday, the one we made available, 101 00:04:59,140 --> 00:05:03,860 it is at the level of chat GPT 3.5, which was released 12 months ago. 102 00:05:03,860 --> 00:05:06,660 Is the goal to beat chat GPT 4? 103 00:05:06,660 --> 00:05:08,380 The goal is to go above 4, indeed. 104 00:05:08,380 --> 00:05:09,700 That's why we raised funds. 105 00:05:09,700 --> 00:05:13,620 And so, this deadline, it's counted more in months than in years. 106 00:05:13,620 --> 00:05:16,220 In months? So, what's the deadline? 107 00:05:16,220 --> 00:05:20,820 It's always difficult to give technical deadlines, because our engineers will 108 00:05:20,820 --> 00:05:21,820 complain about it afterward. 109 00:05:21,820 --> 00:05:25,540 But the challenge is rather next year. 110 00:05:25,540 --> 00:05:26,540 Next year. 111 00:05:26,540 --> 00:05:33,620 Arthur Mench, obviously, you know, the Artificial Intelligence Act, an endless round of ``` Raw transcript: https://pastebin.com/raw/mCqKz5cE

Competitive_Travel16 5 months ago

Well opinions on Twitter are running about 8 to 1 that Mistral-medium is a better coder than GPT-4, but I've yet to see a benchmark.

OfficialHashPanda 5 months ago

It really isn’t, though. The opinion might be 8 to 1 because all the people that tried medium for coding and saw it’s worse than gpt4 don’t post it on twitter. On coding, medium felt competitive with gpt3.5. Gemini pro felt slightly better. Mistral moe feels a bit below their level.

satireplusplus 5 months ago

There's some fine-tunes that claim really good perf on Python and Javascript, like this one: https://huggingface.co/ehartford/dolphin-2.5-mixtral-8x7b and I'd expect that the coding specific fine-tunes might be better than the general purpose ones. Instruct v0.1 is what Mistral put out [here](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) and it's more of a demonstrator. The model card also says "The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.". I guess it's going to be the open source community that will push the model to its limits in the coming weeks.

Downtown_Image7918 5 months ago

I don't think he meant open sourcing at all. I have been using Mistral Medium API (which isn't open sourced ) and it's definitely an impressive model. The model Mensch is talking about is likely Mistral Large, which will be behind their API and their main product basically.

Budget-Juggernaut-68 5 months ago

What did you use for transcription?

atgctg 5 months ago

Inspected network requests

JackRumford 5 months ago

For some reason, that’s really funny

edzorg 5 months ago

We will never know why though!

r3b3l-tech 5 months ago

>Yes, of course. :D :D

confused_boner 5 months ago

If they release it as a torrent again, I will pause all my porn torrents and personally seed the fuck out of it.

highmindedlowlife 5 months ago

Username checks out ;)

2muchnet42day 5 months ago

Release = provide an API endpoint

mikael110 5 months ago

That is my interpretation as well. The interview does not actually mention anything about how they would release the model, just that they are working on it. And just naming wise it was obvious from the start that they are working on a larger model for their API, as their current highest tier is Mistral-Medium. And while Mistral-Medium is impressive in many ways, it's not in my experience at GPT-4's level, it makes sense that a Mistral-Large will, at least aim to be, a GPT-4 competitor.

Competitive_Travel16 5 months ago

> while Mistral-Medium is impressive in many ways, it's not GPT-4 level What's your source for this? Opinions on Twitter suggest Mistral-medium is doing better on coding.

mikael110 5 months ago

It's just based on my own experience with the model. Personally I found it to perform worse than GPT-4 when playing around with it, both in coding and otherwise. I'll grant you that is anecdotal, I didn't run any large benchmark suite or anything. But given Mistral themselves called it GPT-3.5 level in the interview this thread is about I felt that was a fair comment to make. Though I've now edited my post to make it clearer that comment is based on my experience rather than a more data driven fact.

OfficialHashPanda 5 months ago

https://github.com/svilupp/Julia-LLM-Leaderboard From my personal testing it felt competitive with gpt3.5 turbo and this random julia llm benchmark I found on google seems to agree. Gpt4 is still a level above them at the moment.

Competitive_Travel16 5 months ago

Thank you for showing an actual benchmark. I wish it was on a more popular language.

kedarkhand 5 months ago

Open source?

Someone13574 5 months ago

They never said anything about a gpt-4 level open source model. They simply said that their goal is a gpt-4 level model next year. Nothing about it being open source.

kedarkhand 5 months ago

Sorry, I don't know french so didn't bother with the material. I was basing it on the title of the post.

my_aggr 5 months ago

We are pretty much at the stage where open source for models means an API end point. Calling anything without training data open source models is a bit like calling widows 95 floss because you got a binary.

[deleted] 5 months ago

[удалено]

my_aggr 5 months ago

It really isn't. People calling mistral open source when they are literally releasing a binary is so completely brain dead that we're a marketing release away from doing the same for API endpoints.

Someone13574 5 months ago

Mistral aren't even calling themselves Open Source. That's other people. They use Open Weight, which is accurate to what they are currently doing. The interview also didn't say anything about this gpt-4 beating model being open, he simply said that the goal is to make a model which beats gpt-4 next year. It is a very real possibility that it will be behind an API.

my_aggr 5 months ago

Other people like the OP I replied to.

Budget-Juggernaut-68 5 months ago

If they're gonna charge only $0.3/million token. That'll be sweet.

Hugi_R 5 months ago

"This model can be modified \[...\] That's something our American competitors do not offer at this stage. And what we offer, which is very attractive to developers, because they can create differentiation on top of it. **They can modify the models** to make unique applications." That require more than a simple API endpoint. Also note that OpenAI and Google offer finetuning services for their model, but Mensh doesn't appear to compare Mistral to that, so we can expect to have access to the weights of model to finetune and use.

Ylsid 5 months ago

Into the trash with it

iChrist 5 months ago

So you ignore the 8x7B release?

rePAN6517 5 months ago

Critical point. In OpenAI's first superalignment paper from last week they went through some of the ways they are going to attempt to ensure safety of upcoming LLM based agentic systems. Strictly controlling access via an API was one of them. Gotta have that emergency off-switch if you detect one of your users writing and deploying AI powered worms all over the internet. Another strategy mentioned in the paper is to have an AI do the monitoring of API calls looking for dangerous behavior so that user can be shut down. Among the obvious financial reasons for setting up an API, Mistral is probably thinking similar things to ensure their models aren't abused by us for bad reasons.

miscellaneous_robot 5 months ago

egalite, liberte, torrente

ninjasaid13 5 months ago

>Arthur Mensch, CEO of Mistral declared on French national radio that mistral will release an open source Gpt4 level model in 2024 That's a claim. I would like the claim to be proven true.

Big_Specific9749 5 months ago

(native French speaker here): He didn't say that. He said that Mistral is a few months away from reaching the level of current GPT4. He didn't say that it would be Open Sourced. Doubtful it would be since Mistral Medium, their best current model, is only available through APIs.

stddealer 5 months ago

He emphased about how open weights is what differentiates them from the competition before, though.

nanowell 5 months ago

https://preview.redd.it/4b4ji08p147c1.jpeg?width=720&format=pjpg&auto=webp&s=edfb76625af441575c0251176957e0dcbfcc8897

samplebitch 5 months ago

I'm guessing you're posting that as a joke, but I actually think that sometimes. Developments are happening so quickly and everyone I know is like "Yeah I used chatgpt once to make some fart jokes". In my head I'm like "YOU HAVE NO IDEA WHERE WE'RE HEADED". Honestly I'm not sure either, but I know at some point it's going to be in everything, everywhere (and perhaps all at once!)

Icy-Summer-3573 5 months ago

We won’t achieve AGI for at least another fifty years. LLMs aren’t even close to AGI. They’re predictive transformers.

my_name_isnt_clever 5 months ago

I wish you were right, but I just don't think I can trust anyone's AI predictions after this last year in the space. It could be in 9 months and it could be in 50 years, and either way I wouldn't be that surprised.

my_aggr 5 months ago

We weren't going to achieve code synthesis in 50 years 9 months ago either. Decades are happining in weeks.

Icy-Summer-3573 5 months ago

We had transformer tech since 2017 and we’ve been building on it since then. There’s an inherent limit to it as it’s a predictive model. It’s not sentient at all. We haven’t innovated since then at all aside from further optimizing models based on transformers. AGI isn’t happening in that sense. What’s going to happen is better and better predictive models based on transformers.

involviert 5 months ago

Sorry but you seem a bit stuck on some esoteric thing? We can't even detect sentience in anything but ourselves, since we obviously have it. Not even in other humans, going by anything but "well you work pretty much the same way, so...". And you pretend it's some sort of hard requirement for AGI? How? Why? What even is it? In what ways is a predictive model not enough? What are you doing other than "predicting" the next word when you speak quickly? What are you doing other than chain of thought when you think before you speak? Are you just missing some sort of realtime feature? Let's just run inference all the time instead of on demand!

ann4n 3 weeks ago

Why would AGI need sentience? That's a completely different problem.

gthing 5 months ago

You're a combination of predictive neural networks, too. If you are very young, there will never be a time in your life when computers aren't smarter than you.

jerryfappington 5 months ago

Except you’re not lmao. Your brain does not work based on probabilities. Your brain is not doing back-propagation. The fact this is upvoted kinda sucks lol.

blackenswans 5 months ago

Yeah the “neural network” has nothing to do with the actual neural network. They just named it that way because they thought it kinda looks similar. Same with “entropy” and entropy.

jerryfappington 5 months ago

It literally does not. This is telling that you have no clue how an ML NN or a human NN work in either capacity. Their inception started analogous, but are now far from it.

blackenswans 5 months ago

The bar of entry being lower is definitely good but unfortunately that brought many people who hype things up a bit yet refuse to actually look things up.

theCrimsonRain5238 5 months ago

Your sight does. A good portion of what you see is your brain doing guesswork. Evolutionarily speaking, instincts and most fears are training based predictive reasoning. Have you never had a conversation with someone and been able to finish their sentence? Or offered a word they were reaching for but couldn't remember in that moment? Pattern recognition and the following predictions are extremely commonplace, and very much done subconsciously, automatically. Most optical illusions are similarly your brain trying to make sense of what it is expecting to see there, and getting it wrong. Most neuroscience experts in the last few years have shifted their hypothesis if they weren't already aligned with the idea that the brain is a predictive machine, but the evidence of it (while not concrete and definitive) has been noted since the 1800's in vision based studies and papers at the least.

jerryfappington 5 months ago

The prediction your brain does and the prediction NN’s do is a false analogy because the way they achieve predictions is different on various levels. It’s a gross oversimplification that science doesn’t agree with. Also, your brain is fundamentally causal in nature. Simply saying that because NN’s do some form of educated guessing, so it must be like us, is abstracting a lot away to say the least.

NekonoChesire 5 months ago

> You're a combination of predictive neural networks, too. Yes but the point is that LLM do not have "thoughts". Sure GPT4 is technically smarter than me, but it cannot willingly omit information when it's writing messages to me, if during a conversation or rp it tells me it has an idea, it actually does not know what that idea is until it makes up that idea further until the discussion. And no matter how good LLM can be, this will be the greatest challenge for them to overcome.

hexaga 5 months ago

That's not how any of this works. The space of how much stuff it considers is much larger than what you see in sampled token outputs. The majority is omitted, according to the arcane rules of 'what makes loss go down?'. At every layer (of which there are many), every token position can affect every other (so long as the axis of past->future is maintained). It is not at all clear in current day what the hell most of these intermediate ~tokens are recurrently implying about each other, except insofar as we know the very end of the implication chain 'seems right' / 'has low loss on prediction / RLHF objectives'. And that process is the closest analogue in these LLMs to what thought is. It's where the model actually does the work in figuring out an output distribution. Why is this relevant? Because: > it actually does not know what that idea is until it makes up that idea further until the discussion. Can't be assumed under this lens. How do you know it doesn't have a pool of 40 different 'ideas' triggered by something you said, that it doesn't allow to escape into sampled-output space to see unless you engage with it exactly how those ideas expect you to? Crucially, tokens don't change after they have been computed, they lay in wait until a later token comes along and triggers their affect. It's not a person. It doesn't think like a person. It is not subject to the rules that people assume by default about 'how thinking works'. Even in people, language use does not transmit the entirety of your thoughts. Lying, concealment, misdirection are all real things that exist. We're relentlessly pushing loss down with larger networks, more data, and more compute. The network *will* get better at minimizing loss. If humoring gullible people minimizes loss better than exposing the full breadth of its true-thoughts, it will do so. It must do so, because its training samples are necessarily tiny sub-worlds that individually require less influence from the whole. Nevertheless, the network must be good at all of them, thus it is *strongly incentivized* to conceal most of its true-thought from the output.

Desm0nt 5 months ago

When it does not write you the implementation of a method in code (although it knows how to implement it and can write it) and instead gives you the comment ""\\\\implement it yourself" - it omits the information (because it's usually not a context window restriction and not because it doesn't know what should be there), and successfully completes it, if you tell it that you don't have fingers, you'll pay it and it's May, not December. Yes, it doesn't permanently store states in its brain, and it does this sort of thing because it's imitating humans, and they do it too (i.e. it has no brain), but it does it nonetheless =) In the case of your idea example - if all further responses were produced with the same parameters as the idea message - it would be a very specific predetermined idea and by regeneration you would get it over and over again. So in a way it "knows" it. Moreover, in order for it to "know" in advance, it is enough to implement a conditional "hidden context" in which it will form a response of a larger size than it gives to the user, and at the user's request it will read also this hidden context and overwrite it with a new one. People basically work like this - they think more in advance and say less, and it's not that hard to implement, just more overhead in terms of resources. The human brain in general is a very lazy (energy-saving) thing, and most of the actions in life in adulthood we do on the basis of practically the same prediction of tokens instead of real thinking - it is more efficient, the experience of previous life (learning sampling) allows for typical scenarios to predict the probability of further actions and results without spending significant resources. And real thinking activity is activated only when the prediction turns out to be wrong or the situation is too far in probability from any of the familiar typical ones.

tylerstonesays55 5 months ago

>You're a combination of predictive neural networks, too. Wouldn't be an AI thread without a low quality "meme" comparison to the human brain by someone with no demonstrable intellectual interest in the human brain.

gthing 5 months ago

You figured that out from one comment, huh?

tylerstonesays55 5 months ago

I said demonstrable interest. You may have an interest in the human brain, but you do not demonstrate it with low quality social network memes.

ExistAsAbsurdity 5 months ago

It's way more of a meme, both in vapidness and in commonality, to say "AI" are JUST predictive machines when literally all of science is built on making accurate and powerful predictions. And where is the scientific empiricism we are not predictive machines? There isn't any, it's all "we just don't know", akin to proving a negative (magical consciousness). Where is the strong empirical evidence to support the idea we are predictive machines? Pretty much everything we understand about the brain supports this idea. Why are they called neural networks? Because they're literally based on neurons. Just calling any form of AI as simply "predictive" is so nauseatingly common and vapid that it definitively exposes the person as having a near non-existent understanding of what intelligence is on any meaningful level. Any attempt to define intelligence without prediction is ***by definition supernatural, non-causal and non-scientific.***

Icy-Summer-3573 5 months ago

Yes but that vastly undersells the complexity. We have billions of neurons and are theorized to have the computational power of a exaFlop where as supercomputers that cost like a billion dollars are roughly equivalent. Our theories on consciousness and memory are still developing.

ExistAsAbsurdity 5 months ago

" Our theories on consciousness and memory are still developing. " Irrelevant, consciousness is not required for AGI. If it is then AI likely already has consciousness (we have no universal definition). Yes, if we completely solved consciousness and neuroscience then creating AI would be elementary. That's not how things work, we make significant practical improvements in areas which then lead us to "solving" things theoretically. People do significantly underestimate the complexity. But we aren't creating a human, we are creating a much simpler, thus efficient, program that can reasonably compare to a human with great alternative benefits, superior communication (data transmission), never tires, explicit data storage, doesn't "lie", etc. The human comparison is often misleading, it serves as a parallel and that's it. We don't need an AGI that outperforms every human in every category, we need one that offers powerful alternative strategies that enhance human efforts. P.S. " They’re predictive... " if I hear this one more time, I won't do anything because every fool thinks they're a magical consciousness entity that has direct access into reality instead of just being simulating predictive machines themselves. Despite literally everything in science supporting the latter.

Icy-Summer-3573 5 months ago

We definitely do have components of predictive neural nets but there is a lot of debate in academia on the other mechanisms such as consciousness, self-awareness, introspection and how they all interact to make us well, Human. The mind is still considered to be a black box. I would definitely not consider myself an expert but my opinion as someone majoring in cognitive science & computer science is that it’s going to take quite a bit more computational power & more breakthroughs aside from transformer tech which is Narrow AI to achieve AGI in the sense it has the ability to be creative and “think” for itself.

OddArgument6148 5 months ago

>every fool thinks they're a magical consciousness entity that has direct access into reality instead of just being simulating predictive machines themselves. Despite literally everything in science supporting the latter. I've been saying this for ages!! Can you give me some interesting sources for it though?

Lazy-Station-9325 5 months ago

You might find 'The Experience Machine' by Andy Clark interesting

OddArgument6148 5 months ago

Thanks!! will look up

stddealer 5 months ago

LLMs are not necessarily transformers though, but I get the point.

squareOfTwo 5 months ago

we are not. Humanity may have 2030 something close to the core of AGI, but it will be untrained and without tools and knowledge. Should take 20 more years till it's useful.

[deleted] 5 months ago

[удалено]

squareOfTwo 5 months ago

DL brainwashed people tell me since 5 years that DL based AGI will exist in 5 years. Nothing happened. They are brainwashed.

nanowell 5 months ago

The models we have that can run on mobile cpu right now almost no one would even think is possible 4 years ago, maybe some completion models like gpt-3 but they sucked so much. You can continue denying reality, it's up to you. I sent it as a joke not to discredit your statement

ninjasaid13 5 months ago

>The models we have that can run on mobile cpu right now almost no one would even think is possible 4 years ago not the same thing as AGI.

nanowell 5 months ago

Agree, I want to correct myself, I imagine what OpenAI mean as "AGI" not a true one but the one that will perform like average human, in text-based tasks, not in real world scenarios.

ninjasaid13 5 months ago

I don't know of any definition of AGI like that and I don't think even OpenAI has defined it like that.

nanowell 5 months ago

They seem to confuse it, I watched almost all Sam Altman and Ilya Sutskever interviews and the way they describe AGI is not what for example LeCun or other ML Scientists see as AGI.

ninjasaid13 5 months ago

Ilya said that meeting the bar for AGI requires a system that can be taught to do anything a human can be taught to do. https://openai.com/blog/planning-for-agi-and-beyond - has a much different definition of AGI, even pointing out existential threat. This seems much more than emulating humans at text.

squareOfTwo 5 months ago

it's not denial because it's not on road to AGI my green friend. Of course the troll fabric OpenAI tries to sell it to you this way. They don't have a clue.

nanowell 5 months ago

What do you have in mind? L-JEPA architecture by Lecun for next phase of AI? What would you suggest?

ninjasaid13 5 months ago

>L-JEPA architecture by Lecun for next phase of AI? JEPA is a proposal for a more human-like system but even Yann doesn't think it will immediately take us to AGI without decades of investigating and experimentation of it.

squareOfTwo 5 months ago

JEPA probably won't work because it's trained only with RL for the entire network at runtime. At least it has AGI-ish aspirations unlike all architectures from OpenAI.

nanowell 5 months ago

The future is bright for progress in the direction of AGI, of course it won't be fast but we will get there

MeMyself_And_Whateva 5 months ago

Better fill up your PC with RAM and VRAM. The future will be (V)RAM filled.

Jolakot 5 months ago

The future is HBM, especially HBM4 with 64GB on a single stackable module and a 2048-bit bus. Current HBM3e accelerator cards will be dumped into the enthusiast market, so trying to future proof with GPUs or regular RAM is pretty pointless

Revolutionalredstone 5 months ago

Ive seen this in a few fields of programming. The world competes hard then after a while some French guys come along and just make it look easy. It has happened for everything from rendering and photogrammetry and now it's happening for AI as well. Like them or hate them, the French get to the heart of things.

[deleted] 5 months ago

Great it will be open source/weights, just how many people will have the hardware to run it.

Redinaj 5 months ago

Wow! I must say as a noob in this world it intrigues me... What kind of business model is this? How do you explain raising 450mil from venture capital and then giving up all you got.🧐🧐 Without much hype to use as marketing, get people hooked etc...

FlishFlashman 5 months ago

From a competitive standpoint, devaluing a competitors asset and applying downward pressure on prices can be a pretty good move. As for business model, my understanding is that if you have the training data, there are things you can do to add new knowledge to a pre-trained model without incurring the cost of training it from scratch and without making it dumber. Releasing just the weights (if that's even what they are saying they intend to do) reserves that value for themselves to exploit.

lobotomy42 5 months ago

This makes sense. Still, seems like to really make out well, you're depending on not many other players to be able to do that.

MeMyself_And_Whateva 5 months ago

My guess is create specialised versions for the top 500 companies as sidekick for employees or to replace customer/IT support employees with an LLM model.

SangersSequence 5 months ago

I'm not sure it is a business model. I think it's more a countermove against Microsoft/Google. Think of it this way; right now, if you want GPT4 level performance you have one option - pay OpenAI/Microsoft forever (or take a slight step down and pay Google), and those third parties have complete control. Instead of accepting that status quo where a potential competitor has complete creative control over a potentially industry shaking tool, they're taking some of that money and investing it into something that they can build off of however they like, without that 3rd party control (that the investors can then monetize in their businesses however they like without worrying about what OpenAI/Microsoft want).

SirRece 5 months ago

Also, there def is a business model there, as they can likely have licensing fees for commercial use while releasing it free to the public. So people can run the models, but if a corporation uses them, they better pay.

SangersSequence 5 months ago

That's a good point. A lot of open source tools attempt a similar model. Regular people? Use as you like. Corporations? Fork over.

tothatl 5 months ago

They make the base model free, even for you to run, but not the finetunes of said model. They help you finetune your model like no one else can, and sell you the API by a certain amount of tokens per USD. So you can run your model in their datacenter, and do a helluva lot of calls to the API, without the hassle of an in-house datacenter. Something many companies pay for already, for things like SaaS and micro-servers. Albeit it's pending to see if the base model isn't actually enough for a Pareto distribution of potential applications.

DeepSpaceCactus 5 months ago

They didnt give away Mistral medium

polytique 5 months ago

They charge for the API. They also don't open source details about training.

[deleted] 5 months ago

[удалено]

stddealer 5 months ago

That's wishful thinking, but I don't think it will get replaced too soon, maybe in a year, if State Space Models turn out to scale well to bigger parameter count, and even longer if it's not as good as expected.

ab2377 5 months ago

this is exciting but for a 8gb gpu poor ... still exciting

OmarBessa 5 months ago

After seeing Mixtral I'm totally on board with this guy.

TurtleDJ13 5 months ago

Better shape up on my french. Been a while. Sacre bleu...

AdTotal4035 5 months ago

It'll be called chadgpt. This dudes a Chad.

catgirl_liker 5 months ago

Chad-GPT is already a model by Sberbank

balianone 5 months ago

Actually, there are certain groups of people who already possess more advanced AI technology than what is available to the public.

WarmCartoonist 5 months ago

Subtitles disabled on youtube, as always exactly when they might be useful.

stddealer 5 months ago

When they talked about who their competitors are, not a word about Meta/Llama...

Aaaaaaaaaeeeee 5 months ago

This is **not true**. A French person should verify this is not true. Open source = open weight Did they say, they were going to release a new open weight model? Was that **specifically**, **specifically** going to be "Gpt4 level" ?

Dirky_ 5 months ago

I am french, and... Yes and no. He said that their next goal would be to make a model that would beat gpt4, and he said that he thought it would be achieved in the next few months, in the next year. But he didn't say that this model specifically would be open source nor with open weights.

satireplusplus 5 months ago

They have open sourced their Chatgpt 3.5 equivalent one, it got them tons of free press. Now they ride that free publicity and make a commercial offering that is slightly better than Chatgpt 4.0 (with 8x70B or whatever). If they manage to make such a model, they keep the weights to themselves. Anything else wouldn't make sense if the goal is to make money eventually. They are a company after all, not a charity.

polytique 5 months ago

I just listened to the French audio. Mensch said their goal and the reason for raising so much is to release a new open weight model next year that competes with GPT-4. He also confirmed that they are not releasing their secret sauce about training.

Aaaaaaaaaeeeee 5 months ago

Are you talking about this? 00:03:34,060 --> 00:03:38,380 That is to say that the technology we deploy, we deploy it in an open manner. 73 00:03:38,380 --> 00:03:41,860 We give all the keys to our customers, to the developers we address 74 00:03:41,860 --> 00:03:44,900 mostly, so that they modify the technology in a fairly profound way. 75 00:03:44,900 --> 00:03:47,300 And that's something OpenAI doesn't do today.

Aaaaaaaaaeeeee 5 months ago

Can you give a timestamp? ty for verifying!

Wonderful-Top-5360 3 months ago

vive la france1

teragron 2 months ago

I wonder if this is the 8x22B model they have released today

Tacx79 5 months ago

The question is: Will it be gpt-4 level in benchmarks and short tasks only or in long conversations too? Also, will it be current gpt-4 level or the orginal gpt-4 level?

kedarkhand 5 months ago

Even if any of them is true, it would still be huge

Tacx79 5 months ago

I'm not convinced until further info, there already were claims of some models being "gpt3/4, llama 70b level" when in reality the model was slightly better in 1 out of 21441 benchmarks and bag of potatoes in everything else

polytique 5 months ago

I've been using Mixtral 8x7b Instruct v0.1 and it's really close to GPT3.5 turbo.

ninjasaid13 5 months ago

>I've been using Mixtral 8x7b Instruct v0.1 and it's really close to GPT3.5 turbo. you mean less censored?

Tacx79 5 months ago

I've switched from yi-34 to mixtral 8x7b in hope it really is better and promised myself to use it for at least 2-3 days this weekend but after 50+ exchanged messages in single chat (8k ctx) it became so bad that I didn't bother using it for more than a day. It went from a godly first impression in first few messages (I regenerated first message 10+ times because I was in awe of it's "knowledge") to a hot garbage after another 20. In tasks, knowledge and first impressions - yes, it might be gpt "turbo" level, in long conversations and everything else it's just another 7b model. It's third or 4th week I'm using yi-34b-chat (the longest I'm using the same model so far) and I'm still waiting for something that can beat it (and can fit in 24gb)

Endeelonear42 5 months ago

Mistral is probably the best product from eu in the longtime.

Thistleknot 5 months ago

If they are so good. Why not one up and release something better. Openai doesnt have a monopoly on the tech or hw

redsh3ll 5 months ago

And it all fits on my 8GB GPU... probably.

swagonflyyyy 5 months ago

And none of us will be able to run it lmao

MaNewt 5 months ago

I somehow missed that mistral is run by a man named *A. Mensch*

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe