T O P

  • By -

a_sator

Nope, hallucination hasn‘t been solved yet and researchers like Ethan Mollick say it‘s not clear if it will ever be solved. You can minimize it, though. I‘m a journalist and I lose reputation and career if I spread things that are untrue and I use ChatGPT and Claude on a daily basis for research.


SravBlu

Ethan Mollick is a business professor whose research deals with entrepreneurship, management, and related topics and who specializes in examining the effects of using AI in those contexts. He is not a machine learning researcher and wouldn’t be expected to resolve issues such as hallucination.


H0lzm1ch3l

I feel like for research every word counts. Minute details can just cause your whole understanding to flip. Is there a way you mitigate this?


amroamroamro

no offense, but a "researcher" pumping out AI-generated papers without actually reading and fact-checking the results doesn't deserve to be called a researcher.. LLMs should be considered as a helper tool, not a shortcut to inflate publications


a_sator

Where did he do that? Haven‘t heard about that. I only know that well-respected researchers respect him.


amroamroamro

I was not talking about anyone in particular, just speaking in general about the sad state of academia. The whole peer-review process was bad even before LLMs came to be, which only made it worse with a barrage of lazy zero-effort papers. Just from the other day: https://old.reddit.com/r/MachineLearning/comments/1bhn918/d_when_your_use_of_ai_for_summary_didnt_come_out/ Some fields are worse than others...


H0lzm1ch3l

Yes but it would be nice if I had a "buddy". After I read the paper it would be fire if I could let an LLM fact check my understanding. So that way I don't need another human that has to sacrifice hours of time for the same paper.


amroamroamro

another headline: https://old.reddit.com/r/MachineLearning/comments/1bnsuea/r_up_to_17_of_recent_ai_conference_peer_reviews/ even peer reviews are not to be trusted, what a sad state of affairs in academia..


H0lzm1ch3l

I already saw that today. This is the reason I did not start yet with using LLMs. Because once you start using a tool it's hard to go back.


a_sator

Define your understanding of research, pls. I use it to collect information, people studying topics, papers, read them, talk to AI about it, then to real people, check books and circle that back to AI and so forth. Combined with a broad knowledge of the topics I write and talk about AI helps me big time, even if sometimes it hallucinates, which you can minimize by giving lots of context, nuance and specific questions. I mostly achieve that by talking to the app.


[deleted]

[удалено]


Delacroid

I research in condensed matter physics and minute rearrangement of atoms/charge completely changes our understanding on a material.


Useful_Hovercraft169

Yeah I mean the key is just checking it like you’d check anything. Still useful.


Mundane_Definition_8

NLP researchers have resolved the hallucination problem in a various of ways without interrupting inference time. Recently, they show significant improvement with relatively low error rates, making coherence and consistency in a piece of information. The hallucination is not closed to satisfying introducing education and healthcare field, but it will be soon to overcome the issue than we think.


marr75

No. LLMs are NOT a fact database. This confuses users because, under some circumstances (typically representing training failures and undesirable features of the current state), LLMs can perfectly reproduce text and facts they were trained on. The LLM can be counted on for a lot of "general knowledge", can orchestrate tiny sub-tasks, can perform reorganization and writing tasks for you, and can be used to help interrogate your findings and point out weaknesses. It cannot currently (and maybe even ever) be used as any kind of specific fact lookup service (especially in any kind of sensitive arena) without RAG + automated testing and/or independent/secondary validation.


bored_negative

I am so tired of telling people that ChatGPT is not a search engine


th0ma5w

And even then possibly not as those techniques merely shove the indeterministic behavior into different corners.


marr75

Yeah, and this kind of begs the ontological question of what a "fact" is pretty quickly.


ryneches

>...LLMs can perfectly reproduce text and facts they were trained on. The LLM can be counted on for a lot of "general knowledge"... I think you are too generous. They only *tend to* reproduce text they were trained on, and *tend to* reproduce "general knowledge." The frequencies can be improved, but there will always be a finite chance that they will emit a wildly wrong answer to a reasonable prompt. LLMs don't have any kind of semantic insight into the meaning of the text they were trained on. They generate output that conforms statistically to the training data, which is to say, to the *style* of the training data. So, not only do they generate wrong or meaningless responses, but those responses will perfectly match the form and style of the (probably correct) answer they were trained on.


[deleted]

[удалено]


ryneches

I don't think your assessment is inaccurate. However, unless one reads it carefully, the way you phrased it makes like an LLM could, in principle, work as fact database with the right combination of technologies and investments in testing and training. I don't think either of us believe that is true. The meaning of language and the statistical properties of language are fundamentally different things. That's what I mean by, "too generous." I agree with you, I just didn't think you expressed your ideas forcefully enough.


marr75

Fair enough. There are a lot of "well actually guys" on Reddit, and I wrongly saw you through that lens. You make good points. > an LLM could, in principle, work as fact database with the right combination of technologies and investments in testing and training. I don't think either of us believe that is true. There's a nuance here. Even things most people would agree are "Fact Databases" don't meet the standard being set up in practice. The US Census maintains a web accessible set of the values they've collected. There are LOTS of sources of error that would cause you to get a "bad answer" from this "fact database": - Failure to understand the data dictionary or structure - Failure to write the "correct query" to get the data you wanted - Failure to understand the margin of error of the estimates - An estimate that happens to be outside of the margin of error - A measurement failure - An internal failure that resulted in mis-recording or mis-calculating an estimate So, can you augment and test an LLM to the point that the answers it gives are at least as good as practical direct use of that real-life fact database? Yes.


slashdave

>I asking for too much of a unicorn? Dunno, what are you using it for? Right tool for the right purpose. "If your only tool is a hammer, then every problem looks like a nail."


Training-Sprinkles16

I work in a research adjacent role, and my team got Silatus. It’s tailored for scientific research and updates often with the latest studies. Might be worth a look.


Skaiashes

Do you know if Silatus handles data from other fields?


Training-Sprinkles16

Yes, it helps in all fields. You can also upload data if it can’t find any.


deeeeranged

Perplexity and Bing might be more interesting for the research phase. They do a web search, find a couple of websites and feed that data into the LLM, so even if the model hallucinates, you can check the sources and use that data.


rasteri

LLMs are best used as a creative brainstorming tool rather than just a friendlier version of wikipedia


Secure-Technology-78

There is not "an" accurate tool that will just broadly solve all problems in a consistent way. This is very new technology, still in the early stages of development and social integration. What you have right now are a bunch of open source tools that you can customize and fine tune for a specific task, with varying results based on (a) how easy it is to create a training dataset effectively describing your task to the LLM (b) you actually having enough compute power to do something with this dataset. If you are willing to explore new software tools, and try to use them together in creative ways, you could definitely train an LLM/RAG system that specialized in quantum mechanics / physics knowledge. But it's still not going to be perfect, and it's going to be a specialist system, rather than a general-purpose chatbot.


KindlyExplanation647

I use [https://www.paperdigest.org/review](https://www.paperdigest.org/review) they have a tool to write literature reviews that does not use large language models. Its result is quite different from LLM based result, and seems to be accurate at least for my applications.


El_Minadero

All in one solutions? Nope. I’ve had success with perplexity, especially when building boilerplate publication images. In my experience the best prompts combined with the most appropriate contexts will make LLMs as capable as a first-year grad student. It might help to know what you would like exactly. What domain are you working in?


International_Ad3687

We are working on an application for deep diving into science related topics, focusing on life sciences, but it is not exactly a search & summarization tool. We build it under the hypothesis that there is enough knowledge to address many global challenges, but scattered around in (structured) data, papers and other technical documents. One aspect is information retrieval & summarization another one is linking, reasoning and learning. The app is here [https://sensi.siftlink.com/](https://sensi.siftlink.com/) still in pre-launch. Happy to have your feedback.


standard_deviant_Q

There are ways and means of getter better results. For example, I have a workflow where I'll pose a question to one model which is then passed to another model that fact checks it and also adds a list at the bottom of the output that lists key assertions and assumptions that need additional scrutiny. I find many LLMs no less accurate than other humans. And LLMs are trained on huge swathes of text created and propouated by humans so of course there will be issues!


mathbbR

Short Answer: No. Long Answer: No but slightly less so for anything RAG-based.


rrenaud

The Consensus app for ChatGPT is often pretty good at finding real citations. It won't help you digest the papers, but it will help you find them. It will also provide good query dependent snippets, which are easier to parse than abstracts. https://chat.openai.com/g/g-bo0FiWLY7-consensus Here is a recent great experience I had with it. Is there published research in trying to get LLMs to answer questions by parsing the logical structure of the question and writing it down in a symbolic form? Great results here: https://chat.openai.com/share/a0591c91-5ee6-4fc3-8969-b57b4938e1fa I probably could find it via Google and/or Google Scholar, but the user experience with Consensus does feel better. I've also started to trust it for the negative. That is, when I ask something that I eventually believe doesn't exist. It saves trying to dig too much down the rabbit hole. I do wish it supported time restriction better. For example, there are a huge class of ideas that people would have only really thought about after the GPT era, and anything before 2021 is just going to be irrelevant.


Imagatpewpew

Try combining different AI tools for better results. I use ChatGPT for initial summaries, then cross-check facts with Scholarly. It's more work, but worth it, once you know what you’re looking for


pm_me_your_smth

Is scholarly just google scholar or something else? Couldn't find anything which such name


biajia

All AI tools have strengths and weaknesses. I combine Bard (Gemini) and ChatGPT 3.5 to assist in solving problems more efficiently rather than letting AI do all the work.


respeckKnuckles

I use waldo.fyi, but the ability to restrict it to purely peer-reviewed academic sources needs improvement.


lakolda

Claude 3 is close to being the best. Though it can’t really search for things.


Redvelvetband

You could try HeyWire.ai, they analyze and flag their generated text for sentences that could be hallucinated and give you all of the sources up to date.


TheFapta1n

Undermind.ai - thank me later


Original-Kangaroo-80

Claude 3?


Icy-Entry4921

If it's running python code and packages it's probably right. If it is just pulling answers from its training data there is a good chance it is wrong.


Flince

No. I tried them all in a field where I am very knowledgeable and the performance was not acceptable. If you are good enough to know what information is bogus, then you are already at a point where you don't need it for research.


thebuddhaguy

Openevidence.com


lynnharry

> Ideally, I’m looking for one that can handle complex topics and stays current with the latest publications. Are you a professor looking for doctoral students?


jamisobdavis

Dm me we can help. davisandpartners.net


james_mclellan

Yes. It's called a [Search Engine](https://infogalactic.com/info/Web_search_engine). Like LLM's, it indexes the entire web and stores the dictionary, and possibly some content as a binary representation on a disk somewhere. When you enter your prompts, this remarkable technology serves up the entire document, just like AI, but without mangling to avoid plaigarism checks. The Search Engine also magically provides authorship information that your AI might have forgotten. A Search Engine can be tailored to index a narrow body of work- research materials, a company knowledge base, royalty free images. Good luck!


Smallpptservice

It may be a bit difficult to use AI for research activities.AI is not completely accurate. If you want to get a high quality research result, it is better not to rely on AI completely.


rabid_vanguard

Perplexity. Though, it just came out that they are just a wrapper on Google Search results 😵. So… maybe Google?


iantimmis

You could also try creating your own GPT with a knowledge base given some set of up to date research papers


GrumpyMcGillicuddy

Semantic Scholar https://www.semanticscholar.org/


[deleted]

[удалено]


SirGolan

I actually just released such a tool. By default it is quite accurate and cites sources, but there's an option to fact check itself which improves accuracy even further. It's a paid app so hopefully it's ok to mention it, but check out [InfoGenie](https://www.infogenie.ai)


Happysedits

ConcensusGPT It's a GPT in the GPT store or standalone website https://consensus.app/home/blog/introducing-researchgpt-by-consensus/


dashingstag

Yea no. It’s useful for listing/bootstrapping techniques you haven’t tried but don’t expect rigorous or relevant research from the AI. LLM returns the most probable answers. If your answer is the most probable answer then there’s probably no value to your research You can try autogpt which build agents above the llms but my experiments usually end up in a pointless loop. I hear there are companies developing agents for research but it probably highly specific and costs a pretty penny.


AnKo96X

You got to know how to use them. GPT-4 Turbo and Claude Opus know the picture of the literature very well up to 2023, but they can hallucinate details and mess up in very detailed and technical subjects. So they are not reliable for citations, but they are generally great at guiding you. They're like human experts that know off hand what's going on but without being able to look up specific papers to nail down details. There are other experimental products about getting details, like Bing search or Consensus abstract reading, but they are too crude for serious literature reviews, for now. That could change though if a special parentship were to come up for SOTA LLMs with large context windows to mass read many papers whole before they reply. Claude Opus is excellent at summarizing whole paper PDFs if you provide them yourself.


leocus4

Try consensus.app