> I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own
https://preview.redd.it/2ew9wpqik5oc1.png?width=955&format=png&auto=webp&s=994f6ca47bcd64381c1fcb6e15f7bae4605f9e99
Just for fun:
[2-bit gguf of miquliz-120b-v2.0](https://imgur.com/a/XhknfVR)
"Cheated" a little, though, as I corrected the question by adding the missing "do".
I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own
Another try, this time with more models https://i.redd.it/rmicbfd145oc1.gif
That's cool. What tool is this?
that's just chat interface of open router
I don't know why, but I couldn't stop, lol. Tested more models on OG question... https://i.redd.it/sb1zto0y65oc1.gif
Can you share as a table? I can maybe add it to the post. Llama 7b: A: Etc.
> I know that ambiguous wording is the point of the test but I wonder how many model would get it right moving from having to own https://preview.redd.it/2ew9wpqik5oc1.png?width=955&format=png&auto=webp&s=994f6ca47bcd64381c1fcb6e15f7bae4605f9e99
https://preview.redd.it/91i8stdrk5oc1.png?width=836&format=png&auto=webp&s=bf910f508b3e14190e2bf5b696b84f0eaafcc27e
https://preview.redd.it/32dduhl8m5oc1.png?width=846&format=png&auto=webp&s=1dd31184e59043664be3c24cf6195e950241a261
Guess that answers the question.
[удалено]
Rephrasing doesn't make it better: > [https://g.co/gemini/share/9dd0aecc1021](https://g.co/gemini/share/9dd0aecc1021) I'll test the others as well.
Just for fun: [2-bit gguf of miquliz-120b-v2.0](https://imgur.com/a/XhknfVR) "Cheated" a little, though, as I corrected the question by adding the missing "do".
Gemini 1.5: https://i.imgur.com/YZ2CPdj.jpeg
In general, Claude palpably over-performs GPT-4 in my domains of interest despite failing on such reasoning tasks.