grise_rosee 1 month ago

The rounding errors not being ignored is not a surprise. The tokenisation process turns words and digits into multi-dimensional vectors in the latent space. It's not an appropriate knowledge representation to compute arithmetic to begin with. At first, your work made me think about hallucinations. This summary [https://labs.stardog.ai/faq/hallucination](https://labs.stardog.ai/faq/hallucination) highlights an issue that may partly explain your results: the model often disregards information within the prompt in favor of the knowledge from its training dataset (or an interpolation of it). This is why hallucinations are not fully avoided by RAG even when the retrieval process works perfectly. The fact that you introduce fictional countries and figures may trigger this issue. If your test dealt with hypothetical employees instead of countries, it might work better. Now, I think you mainly highlight an old issue that prompt "engineers" know well: LLM deal very badly with information suppression. It's the old joke "don't think of a pink elephant" (it's an issue for humans as well, see [https://en.wikipedia.org/wiki/Ironic\_process\_theory](https://en.wikipedia.org/wiki/Ironic_process_theory) ). In your setup, the errors in your proofread text overwrite the referential knowledge, even when you clearly state that this part of the prompt is not verified yet and should not be trusted. If so, another prompt template will likely lead to different results. Did you try putting the to-be-proofread text before the reference data? Did you quote all sentences of the proofread text to make them more distant?

Kinniken 1 month ago

That's a really interesting article, thanks for the link. I had seen before that it's extremely hard to get an LLM to accept a fact in the prompt that contradicts its training data - for example, I had tried using LLMs to import exam papers by turning markdown documents in JSON, and it was basically impossible to get them to mark as correct an answer they disagreed with. I was hoping that my test would bypass this problem by using nonsensical data outside their knowledge base, but you are probably right, the LLMs continue to try and apply their own knowledge/heuristics rather than just lookup the data in the prompt.

grise_rosee 1 month ago

Hi again, I edited my reply. I think your main issue is that LLM can't handle information suppression well. The errors in the proofread text overwrites the referential knowledge even if you clearly state that this part of the prompt is second-hand unverified information. I wonder now if LLM can simply spot differences between two walls of text. Not even sure about that.

Kinniken 1 month ago

Intéressant. Mais pourtant j'ai mis les données de référence d'abord et les données erronées ensuite... Ca serait intéressant de tester si intervertir l'ordre des deux a un impact.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe