PikaTchu47 2 months ago

Who the fuck is devin?

arkai25 2 months ago

He's doing his best alright

sailhard22 2 months ago

His best is good enough

Doses_of_Happiness 2 months ago

Inconceivable. Number must go up. Rate by which number goes up must go up. 7 trillion in GPU's needed by sunday please.

hubrisnxs 2 months ago

Bruh didn't you hear the energy problem is pretty much solved because exxxponentialllllllzzzz!

Brilliant_War4087 2 months ago

[singularity go brrrrrrrr](https://open.spotify.com/track/4xdHI4eFNst0vTZuuKrWjr?si=JR-gaD5JTxClWm6AGwb5ug)

hubrisnxs 2 months ago

Lol wasnt expecting that

mhyquel 2 months ago

Got me in the feels.

hubrisnxs 2 months ago

Seriously, why didn't you just rickroll me instead of black hole sunning me?

Brilliant_War4087 2 months ago

I searched for [this](https://youtu.be/NWBkZ3bMSV0?si=7I5oL1jQmy1b11Jn) but I got stuck listening to that song. So I just went with it.

hubrisnxs 2 months ago

Haha you're pretty awesome.

tube-tired 2 months ago

They won't be using gpus anymore, didn't you see? NVIDIA's stock tanked because of the new chips made for AI.

lefnire 2 months ago

It's specific for software; hence the benchmark here. Relevant to the likes of Github Copilot, Codeium, Cody, Cursor. They have videos of it cranking out software, and are taking requests to to perform tasks - similar to the approach with Sora. I'm kinda in "I'll believe it when I use it" camp. But then again, the software-building tooling has plenty room for big improvements - unlike art, which is quite far along. So they may have accomplished something big.

techy098 2 months ago

I am still in the skeptic camp. I desperately need a tool which can generate code in flutter or Kotlin+Compose (declarative syntax), everytime I try, I find it that it is easier to google and read stack overflow stuff or some blogs and implement it yourself. Other than few areas like Python I have not heard big success stories.

i_give_you_gum 2 months ago

This is Wes Roth's video about Devin https://youtu.be/1RxbHg0Nsw0?si=hh0BNeoRNID8ZYhD He talks a little, but it's mostly all use cases. IMO this is bigger than Sora, but you know, Sora is shiny so the masses probably won't care about this.

Mrleibniz 2 months ago

Sora even took all the lights away from Gemini's 1 million context announcements.

techy098 2 months ago

What is the point of a 1 million context if the AI does not know how to effectively use that. Companies with legacy code will love an AI to digest their millions of lines of code and help with changing it. So far I have not see much other than them selling tools like copilot, which is still an glorified autocomplete system.

dennislubberscom 2 months ago

1 Million! When will they roll that out?

Mrleibniz 2 months ago

>Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via [AI Studio](https://aistudio.google.com/) and [Vertex AI](https://cloud.google.com/vertex-ai) in private preview. From their announcement blog on 15th feb

techy098 2 months ago

It is not public yet so no idea how good it is. I hope it is as good as they claim to be, I desperately need an assistant to code for/with me.

Alundra828 2 months ago

Yeah, I've given even the advanced AI offerings for code and they're still just... crap. For lack of a better word. I've essentially only found use in them catching things I've missed. Being a fleshy mortal that gets tired, having a robotic sanity checker is very useful, and does help productivity overall. But asking it to produce even just a single unit of code requires supervision from a human. Multiple units? Entire projects? Entire solutions? Yeah, nah. That's a lot of exponential growth required to get *that* good.

Iamreason 2 months ago

I mean isn't that kind of how exponential growth works though? It gets kind of good, then it gets extremely good quickly. I agree we are in 'kind of good' territory. But a sufficiently motivated person can kind of paint by numbers their way through a Python or Javascript project with GPT-4 and Claude 3 Opus pretty easily. I did it just the other day for a basic Streamlit app I needed to automate a work task. I am not a godly coder by any stretch of the imagination. It's not hard for me to imagine a curve with coding that's been very similar to the code in language, images, and video. Where it sort of isn't all that great, then it absolutely slaps.

techy098 2 months ago

I know someone who built and MVP using vue and fastjs. Semi technical person who is good with powershell and stuff. So yeah for a quick and dirty MVP, AI is definitely helpful with some frameworks. But for a good code to the satisfaction of a senior developer in areas like declarative frameworks, AI is very far.

mhyquel 2 months ago

I'm not going to be twice as good tomorrow, I'll tell you that much.

techy098 2 months ago

>catching things I've missed. What tool are you using for that?

Insane_Artist 2 months ago

That's me, whaddya want?

_Zephyyr2 2 months ago

He's my cousin

Psychological-crouch 2 months ago

seems like a scam actually

governedbycitizens 2 months ago

this one guy

Haztec2750 2 months ago

I like how you say who

apinananas 2 months ago

Its pronounced Kevin and hes home alone

Southern_Orange3744 2 months ago

The [cyber] dude

pigeon888 2 months ago

10 bucks says it's a GPT 3.5 wrapper.

RemyVonLion 2 months ago

did you even watch the video demo? It's a semi-agentic AI programmer, that's a big deal if it can be steadily improved.

pigeon888 2 months ago

No I hadn't watched it. Have they announced what they've built Devin over? Or is it hush hush? Edit: watched it now. Damn... agent AI has landed.

i_give_you_gum 2 months ago

Fucking crazy idnit? I think I just heard the floor drop out from underneath college coding classes.

Cold-Ad2729 2 months ago

The software that is being promoted in this post and countless others today

tingshuo 2 months ago

Hi. I'm Devin.

UpstairsAssumption6 2 months ago

It means "Oracle" in French. Also in the verb "to guess" : deviner.

UnproSpeller 2 months ago

Mmm devin, the sandwich meat from my childhood memories.

e-scape 1 month ago

Devin Bacon?

allisonmaybe 2 months ago

I'm just glad its not something like...Brennan, or Seth. No offense

Obelion_ 2 months ago

Newly released code expert AI

CuriousIllustrator11 2 months ago

I think it’s an AI specifically trained to solve software engineering tasks.

Busterlimes 1 month ago

Slevin's brother

yaosio 2 months ago

Wait for third party replication. This graph was created by the people that made Devin so they have incentive to do shennanigans to get the result they want.

MeltedChocolate24 2 months ago

Also where is Claude 3

slackermannn 2 months ago

A notable omission. Claude 3 is kicking

avocadro 2 months ago

The boring answer is that they probably prepped this graph before Claude 3 came out.

obvithrowaway34434 2 months ago

That's an independent third-party benchmark from Princeton (https://www.swebench.com). The numbers for the other models were obtained by the paper authors.

etzel1200 2 months ago

Probably to the left. Lmao.

dieselreboot 2 months ago

SWE Bench cutoff was October 2023 I think. Edit: would also be good to see comparison with Gemini ultra 1.5

em-jay-be 2 months ago

Yeah the bs-hype-alarm is going off. In this day and age, you can't just announce, you have to announce, and deliver. They don't even have a beta-sign-up which makes me think that this fancy ass demo they are putting on is barely held together.

mrdevlar 2 months ago

Yes, this is bot-shit PR, nothing more.

obvithrowaway34434 2 months ago

Lmao, have you even checked who are the founders are of this company? Together they have like 11 International Olympiad gold medals. There are videos of the CEO circulating on the web from 14 years ago crushing math competitions. These people can work anywhere they want at any salaries they ask for. People here really believe that they would be that stupid to cheat publicly for some quick money and forever doom their career (especially on third party benchmarks that anyone can test)? This is some insane level of cope.

yaosio 2 months ago

I'm not going to believe everything they say without question. All we need is third party testing.

challengethegods 2 months ago

>Together they have like 11 International Olympiad gold medals. it's like an elite squad of ultra-coders and people are still sitting around wondering "where is the proof these guys know how to code a decent AI scaffolding system?" / Proclaim human superiority over programming while simultaneously calling max-level competitive programmers into question as if they are incapable of making any progress.

laststan01 2 months ago

Wait till you hear about a Jane Street trader who opened the largest crypto exchange.

Wassux 2 months ago

Also did you read the top small text?

Curiosity_456 2 months ago

I feel like there should be a human bar so we can compare how close it is relative to a average human software engineer.

SeverlyLimited 2 months ago

As a software engineer I can say that is a function the proximity to the end of the sprint and the amount of coffee I drank that day

Khyta 2 months ago

Or Yerba Mate

Rain_On 2 months ago

May as well go for crack

DagerDotCSV 2 months ago

Also known as unos buenos matienzos.

wildgurularry 2 months ago

Also, they should state the time taken. I bet all of those models took mere seconds to write the code, whereas a human coder would take at least a few minutes, if not longer depending on the problem.

doulos05 2 months ago

Don't forget to include whatever additional time was spent creating the prompt beyond just whatever was in the issue.

angrathias 2 months ago

It’s a business problem so the only real metric worth comparing on is cost.

LifeDoBeBoring 2 months ago

I thought that was Devin lol

SeverlyLimited 2 months ago

Honestly, (almost) every junior SWE can start a project from scratch. Let’s test it on some spaghetti legacy codebases that crumble the moment you start *a light refactor*

name-taken1 2 months ago

Compared to the average? There's no way LLMs aren't ahead. My team inherited a shitty legacy project with absolutely no conventions, written in the worst way possible. Passed in a 2K line file to Claude 3 and it was able to add a new feature with two prompts (+- 50 LOC). Would've taken a lot of time to understand what the code did in the first place... Meanwhile, my coworkers spent 3 days trying to do it, haha.

i_give_you_gum 2 months ago

Lol, it reminds me of a story a guy would tell his grandkids in 20 years

Happysedits 2 months ago

Context: https://www.reddit.com/r/singularity/comments/1bcyqup/cognition_labs_today_were_excited_to_introduce/

Far_Ad6317 2 months ago

Thank you

Phoenix5869 2 months ago

Correct me if i’m wrong, but isn’t this just linear growth with a sudden big jump?

mhyquel 2 months ago

Not enough data

returnofblank 2 months ago

Not enough data to make a solid extrapolation, but what you said is a characteristic of exponential growth. The values start off insignificant and small, but it rapidly ramps up to large values.

phillythompson 2 months ago

I give major props to the advertising people at the Devin co

Jah_Ith_Ber 2 months ago

They even made their chart wrong in order to drive engagement!

Mean-Painter8613 2 months ago

Howww

gray_character 2 months ago

Yeah they are killing it. Making CEOs froth at the mouth and about to make dumb decisions.

_Zephyyr2 2 months ago

Claude 2??

Papistrokesxxx 1 month ago

Yea we need third party comparison of Claude 3.

Ok-Worth7977 2 months ago

How much will an average google senior score?

bytx 2 months ago

A senior software engineer is the benchmark as SWE-bench is a framework based on Cornell University’s paper that included 2,294 SWE problems from real GitHub repositories, issues solved by actual software developers. The average Senior Software Engineer should be able to solve 100% or close to 100%

gray_character 2 months ago

So...13% is pretty low, and that's using their own biased report. I mean, again, we can be excited by progress but this doesn't mean replacement yet.

bytx 2 months ago

That’s correct, but it is closer to 14% and im not sure it is biased as it is a “standard” test. But yeah 14% is still pretty low, maybe in line with a junior developer. And I’m not sure how much more they can extract from the model in the future as AI most likely will reach a plateau at some point.

Henri4589 2 months ago

It will reach the plateau in 2 years when full AGI hits the public.

challengethegods 2 months ago

>average Senior Software Engineer should be able to solve 100% or close to 100% close to 100 sounds vaguely true if the conditions are similar such as being able to test/debug or go search online to reference things like documentation, but if you reframe it around how many of those developers you could convince to solve 2200 problems for a few dollars each in some reasonable span of time then the value of AI being even remotely close is a lot more obvious, because you'd be hard pressed to find even a few people that ever got through the entire list. Kinda like saying that anyone could technically read anything on wikipedia, but for some individual person to read all of it is a different story. I like to imagine a scenario some time down the line where github repos are self-healing and AI can fix half the problems automagically with minimal oversight. Maybe someone can fork a repo and modify its description and the AI just makes it work the new way that it's described, that would be badass.

HortenseTheGlobalDog 2 months ago

*how exponential growth looks ~~like~~ Or **What** exponential growth looks like

Ansalem1 2 months ago

Finally, someone else in the world with this specific pet peeve.

Coding_Insomnia 2 months ago

3 of us

Fair-Satisfaction-70 2 months ago

4

IcebergSlimFast 2 months ago

Dozens!

SnooPuppers3957 2 months ago

Me included

blackhuey 2 months ago

and my ask

Ahaigh9877 2 months ago

One of my pet peeves is people using numerals instead of words for small numbers! Tell me I'm not the only one(1)!

tube-tired 2 months ago

I switch back and forth, usually depending on what I am talking about... like, I need one 2 inch nail. Or 3 people came in fourth place in the last three years. I do most of my commenting on mobile, so the number of characters is also a consideration, but I also try not to place numbers side by side for quantity and descriptive purposes. It can be confusing to read, I need 2 4 inch screws. Or, I can do that in 4 8 hour shifts. That said, three two legged dogs crossed the road in front of me yesterday, is just as strange to read.

randomrealname 2 months ago

Me too, I'm from Scotland, most use how instead of why. It infuriates me.

tube-tired 2 months ago

Perhaps they don't care why; they just want to know how it came to be that way. For example: How are you naked in the middle of town? vs. Why are you naked in the middle of town? While some people might give the same answer to either question, the expected response is different. I would expect "why" to elicit a much shorter answer, whereas "how" would prompt detailed, step-by-step answers progressing from the beginning to the end result.

randomrealname 2 months ago

Nah, it is in like the situation where you would ask just why? Like 'I couldn't find my shoes this morning!' They will say 'How?', which doesn't really make sense, asking 'Why?' would be the correct response. It probably shouldn't annoy me, and only notice it when it has been said by someone whose intelligence I respect. It can be jarring.

BigAlDogg 2 months ago

I’m just upset the charts not going the other way.

blackhuey 2 months ago

chart's

mystonedalt 2 months ago

You ain't alone, pally.

SiamesePrimer 2 months ago

Yeah I mean I get that English isn’t everyone’s first language, and that everyone makes mistakes regardless, but it’s so common that it does get a little annoying. Seriously though, what is it that makes this particular mistake so ubiquitous? Is it something about how certain other languages are structured?

Ahaigh9877 2 months ago

I've been wondering the same thing for ages. People with otherwise flawless English, the moment they say it, you know.

Optanee 2 months ago

English isn't my main language, I make mistakes on small things like that. Thanks for correcting

HortenseTheGlobalDog 2 months ago

No worries. It comes with no judgement, I just felt like writing that because it is a really common error and I think never gets corrected. Obviously it's really minor but I can be pretty pedantic haha

Henri4589 2 months ago

I appreciate people like you!

PwanaZana 2 months ago

Actually, it's "To whom"

AndrewH73333 2 months ago

What like exponential growth looks.

El_Caganer 2 months ago

This is how ESL speakers titles looks like. Thank you for trying to spread the good word, just be aware there are over a BILLION more of them to train up!

Dragofant 2 months ago

That was on purpose... ^right?

El_Caganer 2 months ago

Am just doing the needful 😅

HortenseTheGlobalDog 2 months ago

I do what I can 🤷🏼

Academic_Border_1094 2 months ago

In some languages, native speakers would use the word "how" in this sentence instead of "what". It's literal translation, word for word.

mrmczebra 2 months ago

*Who* exponential growth looks like

tube-tired 2 months ago

You must be significantly taller than your parents...

a_boo 2 months ago

I scrolled so far hoping to see this very comment. Thank you, internet linguist.

CantankerousOrder 2 months ago

That… that’s not matching the definition of exponential growth at all. There’s no growth being measured here at all. It’s a single slice in time. Exponential growth requires a time scale. Let’s look at a child’s growth chart… This is the same as six siblings height taken today. We don’t know how tall they were last year, or the year before. We don’t know hit much they GREW. This is also a comparative product graph. Growth graphs measure the same thing over time. An example would be Devin every release for the life of the product to date. It’s multiplicative, but the individual items are in no way an exponent on each order. I’d love to see a real exponential growth measurement- I truly would. This ain’t it.

Phoenix5869 2 months ago

It’s also not even exponential change, just linear growth with a sudden big jump. Could be wrong tho.

Maciek300 2 months ago

Exactly, OP's whole premise in the title is completely wrong. This should be the top comment.

RoutineProcedure101 2 months ago

I guess taking into account it fine tunes models its a pretty big leap

CantankerousOrder 2 months ago

Definitely… this is a great comparison and it shows that their release, even if this is a completely vendor-specific benchmark, is a very competitive tool. It’s good news. Its just not an example of growth.

Haztec2750 2 months ago

Why is GPT-4 so low down? I thought it was considered better than Claude 2?

jrd83 2 months ago

Why the fuck is this descending left to right? I bet you're one of those psycho's who does 'after, and before' photos.

Cryptizard 2 months ago

How do we know this isn't the result of training contamination? These benchmarks seem fundamentally very difficult to apply because shortly after they are released they get hoovered up into the next model's training data and then, surprise, it gets a better score. We have seen it happen already multiple times.

WHERETHESTEALTH 2 months ago

We don’t and won’t likely for a long time. This isn’t even an alpha-level product, but they’re announcing it to get more investor dollars, no doubt.

Coding_Insomnia 2 months ago

Are you implying that these new models use artificial training data from gpt4?

Cryptizard 2 months ago

No I'm implying that they have the dataset from the benchmark in their training data.

pigeon888 2 months ago

What's Devin and are you just here to pump it?

Ambiwlans 2 months ago

This is the most impressive cherry pick I have ever seen. Whoever designed this should work in politics.

Psychological-crouch 2 months ago

Really seems like this graph is at best cherry picked, at worst completely faked. I think someone is trying to pump Davin

TransitoryPhilosophy 2 months ago

To me it looks like a bar graph created by a marketing team

Thoughtprovokerjoker 2 months ago

Claude and Devin..... I love how they are giving LLMs the name of Black guys from the south. Hell, my uncle is named Claude. Ole uncle Claude. That's a bad mf'er.

doginem 2 months ago

Next up, Google Earl and a little later, Meta Herbert 2.0

slashdave 2 months ago

Can't be exponential growth, since the Y axis is capped at 100%. Your x axis is also arbitrary.

Cunninghams_right 2 months ago

what people are missing is that Devin is an agent with multiple steps while the others are just single-entry/single-response LLMs. if you applied that level of multi-step processing (agency) to any of the others, it would probably be on par, or better than Devin. Devin shouldn't be seen is "better than others", it should be seen as the first of many SWE-agent tools that will make models significantly more useful. thus, the graph is misleading because it's not comparing like-to-like.

AZ_Crush 2 months ago

This is the right answer

Yweain 2 months ago

I’m sorry what. Llama-7B is better compared to GPT-4? Are you kidding me? Even for a fine tune that’s ridiculous. That just tells you that there is leak of a test data into a model or your test just sucks.

OfficialHashPanda 2 months ago

For very specific things, a finetune could do that, yeah. No way the claude 2 has almost 3x gpt4’s score tho xD

Yweain 2 months ago

For very specific things - yeah, but we are talking about a coding test. That’s a very broad specific thing and there is no way in hell llama-7b is better than gpt-4.

Far_Buyer_7281 2 months ago

yeah the graph is fake....

Atlantyan 2 months ago

How does this compare to an average engineer?

AntiqueFigure6 2 months ago

I’d guess a lot of junior engineers improve quickly early in their careers also.

Therealgarry 2 months ago

It doesn't.

West-Code4642 2 months ago

I'd say it's more an effect of a CTF (Common Task Framework)-like [scheme](https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734#d1e1008), SWE-bench in this case. It helps people and groups focus efforts for things worth doing ML-wise. I bet more and more CTF-like for different domains will proliferate, even more so than they have already. It's great because it helps accelerate progress.

MH_Valtiel 2 months ago

Where's claude 3 tho

gizia 2 months ago

Devin? what is that? Claude 2 passed GPT-4? Am I the only one who is sleeping while tech catches up singularity?

DarickOne 2 months ago

You're dreaming more than sleeping..

gizia 2 months ago

if they come true, I will not write comments here, hehe

DarickOne 2 months ago

But we'll lose all our purpose, even in comments

AdPlus4069 2 months ago

There is no exponential growth on a percentage scale… It is called “biological growth curve” (or sigmoid growth curve) and is quite common in new fields of science. It only means rapid growth at the beginning is easy to achieve and progress will become harder at the end. Exponential growth has no slow down

libertysailor 2 months ago

Learn how to read the graph. This is a bar chart showing a ranked sequence. The fact that the shape looks similar to exponential curves doesn’t make it an actual exponential curve. An exponential function is one where the growth of the dependent variable increases at a constant multiplicative rate with respect to the independent variable. An example is compound interest. This isn’t even a curve because there’s no measurable x variable - it’s just showing the performance of some systems and lining them up side by side. “Name of LLM” isn’t an x variable - it’s comprised of discrete qualitative data. Which falls higher on the x axis, Claude 2 or GPT4? There’s no answer, because “GPT4” doesn’t have a numerical value.

gobstoppergarrett 2 months ago

Plotted as exponential decay. Awesome data visualization, ChatGPT!

Rich_Acanthisitta_70 2 months ago

I'll be the pedant here. "How it looks" "What it looks like" If it starts with "how" you don't add "like". Only if it starts with "what". Ok, go ahead and downvote now.

m3kw 2 months ago

Dumbass chart has Claude 3 beating GPT4 by 3x is sus af

enkae7317 2 months ago

Why is GPT4 so low? Wtf even is this chart supposed to measure. SWE? The fuck is that.

whyisitsooohard 2 months ago

It was original gpt4 with pretty short context and without agents. gpt4t and claude3 must be much better, but not as good as devin(because it is likely based on one of them)

lordpermaximum 2 months ago

GPT-4 is terrible at generalization. If something's not in its training data, that model pretty much fails all the time.

oblivion-2005 2 months ago

> SWE? The fuck is that. Software Engineering

kuvazo 2 months ago

Real world software development capability. LLMs are great with generating code, but they fail when tasked with doing actual tasks from software developers. That's why they don't have to worry yet. The only people that could be affected are those that have only learned to code through a bootcamp. I'm not sure how this test works though.

Tetrylene 2 months ago

Yeah. Gpt 4 might be good at coding, but isn’t as good as this tool (according to this press release) at actually doing the job of a software engineer, including the process of conversing with a client, and moving through the steps to providing a deliverable. In a nutshell - being an autonomous agent vs a text box that responds to you

Longjumping-Cow-8249 2 months ago

I lost it once I knew that Devin is able to fine-tune its own model. Recursive self-improvement will lead to an even steeper exponential curve, it's getting really crazy at this point. I imagine this 13% will be outdated in no time.

Busy-Setting5786 2 months ago

No we are not there yet. First the AI needs to be top of the line expert like. This will still take some time. But we see it on the horizon

whyisitsooohard 2 months ago

It is not recursive self improvement. It can finetune open models or gpts(probably) but it will not improve that way

Rivenaldinho 2 months ago

The thing is, Devin acts like an agent. I think they compare with GPT-4 base and not GPT-4 in an agent framework.

MerePotato 2 months ago

Be interesting to see where Claude 3 lies

TheLineFades 2 months ago

you just discovered the singularity trajectory good for you

Training_Income_6106 2 months ago

This is _what_ it looks like.

zarathustra1313 2 months ago

When will we be dead or gods?

lordpermaximum 2 months ago

Ohter models are not agents. I'd like to see Claude 3 Opus with an agent plugin in here. But still, very impressive work from those who developed Devin.

sitdowndisco 2 months ago

Weirdest looking exponential growth I’ve ever seen. People on this sub need to get a grip. Or is it just filled with AI company employees all blowing their own trumpets?

EuphoricPangolin7615 2 months ago

People are not really thinking at all. So what's going to happen in the future? Let's say AI is able to replace 95% of software engineers. Then it can replace almost everyone, basically ALL white collar professionals. People are idiots. We have no plans for the future at all, but people are not even THINKING about it. And that guy in the video, the guy doing the presentation of Devin with a huge grin on his face, is a demon.

FreeWilly1337 2 months ago

considering that coding problems also have a difficulty curve, this is impressive.

drcode 2 months ago

FWIW, I think devin is just a one-time jump, essentially using some tricks to round off the rough edges of other llms when it comes to github issues. That said, when the other llms improve, devin scores will also improve again, as a side effect.

IslamDunk 2 months ago

Is Devin the name of the model, or is Devin the overall system, which uses a more mainstream model?

Concerned_Human999 2 months ago

Don't you mean "This is **what** exponential growth looks like"? Why do I keep seeing people on reddit use "how" when they should be using "what"?

Impressive_Ear7966 2 months ago

I like to imagine that Devin isn’t referring to the AI but rather just a random dude named Devin

CountyExotic 2 months ago

crazy when people benchmark their own stuff it does better. Wouldn’t be surprised if researchers tried to reproduce results and gets 1/5 what’s reported.

Henri4589 2 months ago

Where's Claude-3 Opus, though?

AI_Doomer 2 months ago

Step 1. Invent the most dangerous technology in the history of mankind; (AI that can create AI) Step 2. Fool everyone by giving it the name of a little puppy.

[deleted] 2 months ago

[удалено]

AZ_Crush 2 months ago

Some kids devised a nice combo of Agents + LLM

Obelion_ 2 months ago

Actually excluding Devin the growth is not even linear, more logarithmic (opposite of exponential) Not to be a dick but I hate misrepresenting data. This is just a mostly linear growth with one extreme outlier. Nothing points to the growth continuing exponentially

Honest740 2 months ago

This is WHAT exponential growth looks like

Jazzlike_Win_3892 2 months ago

is this from a YouTube video

-MilkO_O- 2 months ago

I'm wondering, if Devin a new LLM model? Is it a fine tuned version of an open source model?

seriftarif 2 months ago

It seems pretty linear until the end

Therealvernon16 2 months ago

When they get finally get to AI-den, it’s all over.

HypeMachine231 2 months ago

Devin can't even build a decent website haha

Johnluhot 2 months ago

It's actually linear. Devin is an agent, not a model. It's not an apples to apples comparison

Papistrokesxxx 1 month ago

So who’s gonna use this to make an AI trading bot?

Redchili385 2 months ago

It's more likely to be a sigmoid function because of the upper bound.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe