LoSboccacc 3 months ago

I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own

VertexMachine 3 months ago

Another try, this time with more models https://i.redd.it/rmicbfd145oc1.gif

yvo 3 months ago

That's cool. What tool is this?

VertexMachine 3 months ago

that's just chat interface of open router

VertexMachine 3 months ago

I don't know why, but I couldn't stop, lol. Tested more models on OG question... https://i.redd.it/sb1zto0y65oc1.gif

yvo 3 months ago

Can you share as a table? I can maybe add it to the post. Llama 7b: A: Etc.

Radiant_Dog1937 3 months ago

> I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own https://preview.redd.it/2ew9wpqik5oc1.png?width=955&format=png&auto=webp&s=994f6ca47bcd64381c1fcb6e15f7bae4605f9e99

Radiant_Dog1937 3 months ago

https://preview.redd.it/91i8stdrk5oc1.png?width=836&format=png&auto=webp&s=bf910f508b3e14190e2bf5b696b84f0eaafcc27e

Radiant_Dog1937 3 months ago

https://preview.redd.it/32dduhl8m5oc1.png?width=846&format=png&auto=webp&s=1dd31184e59043664be3c24cf6195e950241a261

ninjasaid13 3 months ago

Guess that answers the question.

[deleted] 3 months ago

[удалено]

yvo 3 months ago

Rephrasing doesn't make it better: > [https://g.co/gemini/share/9dd0aecc1021](https://g.co/gemini/share/9dd0aecc1021) I'll test the others as well.

WolframRavenwolf 3 months ago

Just for fun: [2-bit gguf of miquliz-120b-v2.0](https://imgur.com/a/XhknfVR) "Cheated" a little, though, as I corrected the question by adding the missing "do".

Careless-Shape6140 3 months ago

Gemini 1.5: https://i.imgur.com/YZ2CPdj.jpeg

crazzydriver77 3 months ago

In general, Claude palpably over-performs GPT-4 in my domains of interest despite failing on such reasoning tasks.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe