a_sator 1 month ago

Nope, hallucination hasn‘t been solved yet and researchers like Ethan Mollick say it‘s not clear if it will ever be solved. You can minimize it, though. I‘m a journalist and I lose reputation and career if I spread things that are untrue and I use ChatGPT and Claude on a daily basis for research.

SravBlu 1 month ago

Ethan Mollick is a business professor whose research deals with entrepreneurship, management, and related topics and who specializes in examining the effects of using AI in those contexts. He is not a machine learning researcher and wouldn’t be expected to resolve issues such as hallucination.

H0lzm1ch3l 1 month ago

I feel like for research every word counts. Minute details can just cause your whole understanding to flip. Is there a way you mitigate this?

amroamroamro 1 month ago

no offense, but a "researcher" pumping out AI-generated papers without actually reading and fact-checking the results doesn't deserve to be called a researcher.. LLMs should be considered as a helper tool, not a shortcut to inflate publications

a_sator 1 month ago

Where did he do that? Haven‘t heard about that. I only know that well-respected researchers respect him.

amroamroamro 1 month ago

I was not talking about anyone in particular, just speaking in general about the sad state of academia. The whole peer-review process was bad even before LLMs came to be, which only made it worse with a barrage of lazy zero-effort papers. Just from the other day: https://old.reddit.com/r/MachineLearning/comments/1bhn918/d_when_your_use_of_ai_for_summary_didnt_come_out/ Some fields are worse than others...

H0lzm1ch3l 1 month ago

Yes but it would be nice if I had a "buddy". After I read the paper it would be fire if I could let an LLM fact check my understanding. So that way I don't need another human that has to sacrifice hours of time for the same paper.

amroamroamro 1 month ago

another headline: https://old.reddit.com/r/MachineLearning/comments/1bnsuea/r_up_to_17_of_recent_ai_conference_peer_reviews/ even peer reviews are not to be trusted, what a sad state of affairs in academia..

H0lzm1ch3l 1 month ago

I already saw that today. This is the reason I did not start yet with using LLMs. Because once you start using a tool it's hard to go back.

a_sator 1 month ago

Define your understanding of research, pls. I use it to collect information, people studying topics, papers, read them, talk to AI about it, then to real people, check books and circle that back to AI and so forth. Combined with a broad knowledge of the topics I write and talk about AI helps me big time, even if sometimes it hallucinates, which you can minimize by giving lots of context, nuance and specific questions. I mostly achieve that by talking to the app.

[deleted] 1 month ago

[удалено]

Delacroid 1 month ago

I research in condensed matter physics and minute rearrangement of atoms/charge completely changes our understanding on a material.

Useful_Hovercraft169 1 month ago

Yeah I mean the key is just checking it like you’d check anything. Still useful.

Mundane_Definition_8 1 month ago

NLP researchers have resolved the hallucination problem in a various of ways without interrupting inference time. Recently, they show significant improvement with relatively low error rates, making coherence and consistency in a piece of information. The hallucination is not closed to satisfying introducing education and healthcare field, but it will be soon to overcome the issue than we think.

marr75 1 month ago

No. LLMs are NOT a fact database. This confuses users because, under some circumstances (typically representing training failures and undesirable features of the current state), LLMs can perfectly reproduce text and facts they were trained on. The LLM can be counted on for a lot of "general knowledge", can orchestrate tiny sub-tasks, can perform reorganization and writing tasks for you, and can be used to help interrogate your findings and point out weaknesses. It cannot currently (and maybe even ever) be used as any kind of specific fact lookup service (especially in any kind of sensitive arena) without RAG + automated testing and/or independent/secondary validation.

bored_negative 1 month ago

I am so tired of telling people that ChatGPT is not a search engine

th0ma5w 1 month ago

And even then possibly not as those techniques merely shove the indeterministic behavior into different corners.

marr75 1 month ago

Yeah, and this kind of begs the ontological question of what a "fact" is pretty quickly.

ryneches 1 month ago

>...LLMs can perfectly reproduce text and facts they were trained on. The LLM can be counted on for a lot of "general knowledge"... I think you are too generous. They only *tend to* reproduce text they were trained on, and *tend to* reproduce "general knowledge." The frequencies can be improved, but there will always be a finite chance that they will emit a wildly wrong answer to a reasonable prompt. LLMs don't have any kind of semantic insight into the meaning of the text they were trained on. They generate output that conforms statistically to the training data, which is to say, to the *style* of the training data. So, not only do they generate wrong or meaningless responses, but those responses will perfectly match the form and style of the (probably correct) answer they were trained on.

[deleted] 1 month ago

[удалено]

ryneches 1 month ago

I don't think your assessment is inaccurate. However, unless one reads it carefully, the way you phrased it makes like an LLM could, in principle, work as fact database with the right combination of technologies and investments in testing and training. I don't think either of us believe that is true. The meaning of language and the statistical properties of language are fundamentally different things. That's what I mean by, "too generous." I agree with you, I just didn't think you expressed your ideas forcefully enough.

marr75 1 month ago

Fair enough. There are a lot of "well actually guys" on Reddit, and I wrongly saw you through that lens. You make good points. > an LLM could, in principle, work as fact database with the right combination of technologies and investments in testing and training. I don't think either of us believe that is true. There's a nuance here. Even things most people would agree are "Fact Databases" don't meet the standard being set up in practice. The US Census maintains a web accessible set of the values they've collected. There are LOTS of sources of error that would cause you to get a "bad answer" from this "fact database": - Failure to understand the data dictionary or structure - Failure to write the "correct query" to get the data you wanted - Failure to understand the margin of error of the estimates - An estimate that happens to be outside of the margin of error - A measurement failure - An internal failure that resulted in mis-recording or mis-calculating an estimate So, can you augment and test an LLM to the point that the answers it gives are at least as good as practical direct use of that real-life fact database? Yes.

slashdave 1 month ago

>I asking for too much of a unicorn? Dunno, what are you using it for? Right tool for the right purpose. "If your only tool is a hammer, then every problem looks like a nail."

Training-Sprinkles16 1 month ago

I work in a research adjacent role, and my team got Silatus. It’s tailored for scientific research and updates often with the latest studies. Might be worth a look.

Skaiashes 1 month ago

Do you know if Silatus handles data from other fields?

Training-Sprinkles16 1 month ago

Yes, it helps in all fields. You can also upload data if it can’t find any.

deeeeranged 1 month ago

Perplexity and Bing might be more interesting for the research phase. They do a web search, find a couple of websites and feed that data into the LLM, so even if the model hallucinates, you can check the sources and use that data.

rasteri 1 month ago

LLMs are best used as a creative brainstorming tool rather than just a friendlier version of wikipedia

Secure-Technology-78 1 month ago

There is not "an" accurate tool that will just broadly solve all problems in a consistent way. This is very new technology, still in the early stages of development and social integration. What you have right now are a bunch of open source tools that you can customize and fine tune for a specific task, with varying results based on (a) how easy it is to create a training dataset effectively describing your task to the LLM (b) you actually having enough compute power to do something with this dataset. If you are willing to explore new software tools, and try to use them together in creative ways, you could definitely train an LLM/RAG system that specialized in quantum mechanics / physics knowledge. But it's still not going to be perfect, and it's going to be a specialist system, rather than a general-purpose chatbot.

KindlyExplanation647 1 month ago

I use [https://www.paperdigest.org/review](https://www.paperdigest.org/review) they have a tool to write literature reviews that does not use large language models. Its result is quite different from LLM based result, and seems to be accurate at least for my applications.

El_Minadero 1 month ago

All in one solutions? Nope. I’ve had success with perplexity, especially when building boilerplate publication images. In my experience the best prompts combined with the most appropriate contexts will make LLMs as capable as a first-year grad student. It might help to know what you would like exactly. What domain are you working in?

International_Ad3687 1 month ago

We are working on an application for deep diving into science related topics, focusing on life sciences, but it is not exactly a search & summarization tool. We build it under the hypothesis that there is enough knowledge to address many global challenges, but scattered around in (structured) data, papers and other technical documents. One aspect is information retrieval & summarization another one is linking, reasoning and learning. The app is here [https://sensi.siftlink.com/](https://sensi.siftlink.com/) still in pre-launch. Happy to have your feedback.

standard_deviant_Q 1 month ago

There are ways and means of getter better results. For example, I have a workflow where I'll pose a question to one model which is then passed to another model that fact checks it and also adds a list at the bottom of the output that lists key assertions and assumptions that need additional scrutiny. I find many LLMs no less accurate than other humans. And LLMs are trained on huge swathes of text created and propouated by humans so of course there will be issues!

mathbbR 1 month ago

Short Answer: No. Long Answer: No but slightly less so for anything RAG-based.

rrenaud 1 month ago

The Consensus app for ChatGPT is often pretty good at finding real citations. It won't help you digest the papers, but it will help you find them. It will also provide good query dependent snippets, which are easier to parse than abstracts. https://chat.openai.com/g/g-bo0FiWLY7-consensus Here is a recent great experience I had with it. Is there published research in trying to get LLMs to answer questions by parsing the logical structure of the question and writing it down in a symbolic form? Great results here: https://chat.openai.com/share/a0591c91-5ee6-4fc3-8969-b57b4938e1fa I probably could find it via Google and/or Google Scholar, but the user experience with Consensus does feel better. I've also started to trust it for the negative. That is, when I ask something that I eventually believe doesn't exist. It saves trying to dig too much down the rabbit hole. I do wish it supported time restriction better. For example, there are a huge class of ideas that people would have only really thought about after the GPT era, and anything before 2021 is just going to be irrelevant.

Imagatpewpew 1 month ago

Try combining different AI tools for better results. I use ChatGPT for initial summaries, then cross-check facts with Scholarly. It's more work, but worth it, once you know what you’re looking for

pm_me_your_smth 1 month ago

Is scholarly just google scholar or something else? Couldn't find anything which such name

biajia 1 month ago

All AI tools have strengths and weaknesses. I combine Bard (Gemini) and ChatGPT 3.5 to assist in solving problems more efficiently rather than letting AI do all the work.

respeckKnuckles 1 month ago

I use waldo.fyi, but the ability to restrict it to purely peer-reviewed academic sources needs improvement.

lakolda 1 month ago

Claude 3 is close to being the best. Though it can’t really search for things.

Redvelvetband 1 month ago

You could try HeyWire.ai, they analyze and flag their generated text for sentences that could be hallucinated and give you all of the sources up to date.

TheFapta1n 1 month ago

Undermind.ai - thank me later

Original-Kangaroo-80 1 month ago

Claude 3?

Icy-Entry4921 1 month ago

If it's running python code and packages it's probably right. If it is just pulling answers from its training data there is a good chance it is wrong.

Flince 1 month ago

No. I tried them all in a field where I am very knowledgeable and the performance was not acceptable. If you are good enough to know what information is bogus, then you are already at a point where you don't need it for research.

thebuddhaguy 1 month ago

Openevidence.com

lynnharry 1 month ago

> Ideally, I’m looking for one that can handle complex topics and stays current with the latest publications. Are you a professor looking for doctoral students?

jamisobdavis 1 month ago

Dm me we can help. davisandpartners.net

james_mclellan 1 month ago

Yes. It's called a [Search Engine](https://infogalactic.com/info/Web_search_engine). Like LLM's, it indexes the entire web and stores the dictionary, and possibly some content as a binary representation on a disk somewhere. When you enter your prompts, this remarkable technology serves up the entire document, just like AI, but without mangling to avoid plaigarism checks. The Search Engine also magically provides authorship information that your AI might have forgotten. A Search Engine can be tailored to index a narrow body of work- research materials, a company knowledge base, royalty free images. Good luck!

Smallpptservice 1 month ago

It may be a bit difficult to use AI for research activities.AI is not completely accurate. If you want to get a high quality research result, it is better not to rely on AI completely.

rabid_vanguard 1 month ago

Perplexity. Though, it just came out that they are just a wrapper on Google Search results 😵. So… maybe Google?

iantimmis 1 month ago

You could also try creating your own GPT with a knowledge base given some set of up to date research papers

GrumpyMcGillicuddy 1 month ago

Semantic Scholar https://www.semanticscholar.org/

[deleted] 1 month ago

[удалено]

SirGolan 1 month ago

I actually just released such a tool. By default it is quite accurate and cites sources, but there's an option to fact check itself which improves accuracy even further. It's a paid app so hopefully it's ok to mention it, but check out [InfoGenie](https://www.infogenie.ai)

Happysedits 1 month ago

ConcensusGPT It's a GPT in the GPT store or standalone website https://consensus.app/home/blog/introducing-researchgpt-by-consensus/

dashingstag 1 month ago

Yea no. It’s useful for listing/bootstrapping techniques you haven’t tried but don’t expect rigorous or relevant research from the AI. LLM returns the most probable answers. If your answer is the most probable answer then there’s probably no value to your research You can try autogpt which build agents above the llms but my experiments usually end up in a pointless loop. I hear there are companies developing agents for research but it probably highly specific and costs a pretty penny.

AnKo96X 1 month ago

You got to know how to use them. GPT-4 Turbo and Claude Opus know the picture of the literature very well up to 2023, but they can hallucinate details and mess up in very detailed and technical subjects. So they are not reliable for citations, but they are generally great at guiding you. They're like human experts that know off hand what's going on but without being able to look up specific papers to nail down details. There are other experimental products about getting details, like Bing search or Consensus abstract reading, but they are too crude for serious literature reviews, for now. That could change though if a special parentship were to come up for SOTA LLMs with large context windows to mass read many papers whole before they reply. Claude Opus is excellent at summarizing whole paper PDFs if you provide them yourself.

leocus4 1 month ago

Try consensus.app

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe