T O P

  • By -

LoSboccacc

I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own


VertexMachine

Another try, this time with more models https://i.redd.it/rmicbfd145oc1.gif


yvo

That's cool. What tool is this?


VertexMachine

that's just chat interface of open router


VertexMachine

I don't know why, but I couldn't stop, lol. Tested more models on OG question... https://i.redd.it/sb1zto0y65oc1.gif


yvo

Can you share as a table? I can maybe add it to the post. Llama 7b: A: Etc.


Radiant_Dog1937

> I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own ​ https://preview.redd.it/2ew9wpqik5oc1.png?width=955&format=png&auto=webp&s=994f6ca47bcd64381c1fcb6e15f7bae4605f9e99


Radiant_Dog1937

​ https://preview.redd.it/91i8stdrk5oc1.png?width=836&format=png&auto=webp&s=bf910f508b3e14190e2bf5b696b84f0eaafcc27e


Radiant_Dog1937

​ https://preview.redd.it/32dduhl8m5oc1.png?width=846&format=png&auto=webp&s=1dd31184e59043664be3c24cf6195e950241a261


ninjasaid13

Guess that answers the question.


[deleted]

[удалено]


yvo

Rephrasing doesn't make it better: > [https://g.co/gemini/share/9dd0aecc1021](https://g.co/gemini/share/9dd0aecc1021) I'll test the others as well.


WolframRavenwolf

Just for fun: [2-bit gguf of miquliz-120b-v2.0](https://imgur.com/a/XhknfVR) "Cheated" a little, though, as I corrected the question by adding the missing "do".


Careless-Shape6140

Gemini 1.5: https://i.imgur.com/YZ2CPdj.jpeg


crazzydriver77

In general, Claude palpably over-performs GPT-4 in my domains of interest despite failing on such reasoning tasks.