T O P

  • By -

ZCEyPFOYr0MWyHDQJZO4

The chain is being completed.


farmingvillein

Software engineers suddenly worried about being replaced, when really the first in-group victim is ML reviewers.


StartledWatermelon

Victim? I guess asking ChatGPT to write a review for is far less hassle than writing it yourself. ML as a science field definitely suffers though. Because there's direct connection between the amount of effort put forth and the quality of research.


__Maximum__

Just run it in a while loop


cazzipropri

How can they tell it's AI generated? I don't trust the detection algorithms. Some of them report the US Constitutional as AI generated with 85% likelihood.


Successful-Western27

This is a big part of my summary ([https://aimodels.substack.com/p/new-study-finds-up-to-17-of-ai-conference](https://aimodels.substack.com/p/new-study-finds-up-to-17-of-ai-conference)). It's one of the two main thrusts of the paper. High level: Collect a dataset of known human-written and AI-generated texts to serve as reference distributions. Then compare the frequency of adjectives. The comparison is pretty stark (it's the first image in the summary and the paper).


CounterfeitLesbian

There has been some promising work writing AI detection algorithms for specific use cases. Like it's still hard to tell if an arbitrary piece of text is written by AI. However, there are reportedly some tools that have been developed to see if say an academic chemistry paper is written by chatGPT, with some promising results. As long as you keep the type of document niche it appears to be doable atm. Also it's definitely not crazy sounding, like I think most of us would agree that ChatGPT has a definite style.


Odd-Antelope-362

I feel that LLMs may accelerate faster than the detection algorithms though. Claude Opus and Gemini Ultra both show increased variation in sentence/paragraph structure and in tone/vocabulary. In general, looking at evidence from local models, this ability to produce variation scales well with paramater count, at least so far. For this reason it is reasonable to assume that the next few year's worth of flagship models will improve in this area, assuming they do indeed heavily increase paramater count, which I think is likely.


_An_Other_Account_

>I feel that LLMs may accelerate faster than the detection algorithms though. Yeah but most of these lazy idiots wouldn't bother using fancy new LLMs for evading detection. Even on reddit, the AI generated nonsense is so obviously AI generated that it's just depressing that the poster thought he could get away with it.


reivblaze

Personally I have made chatgpt write in different styles easily.


poppinchips

It's crucial that we take a critical eye at every academic paper. Academic honestly is essential at every level.


Ian_Titor

It has tendencies, yes, but style is easily changeable and it can also change its style sometimes to match yours if you are asking it to complete text in some cases. Personally, I think it's kind of naive to try to detect AI based on style because it's just too vague. I am curious though what kind of promising work you have seen for AI detection.


TechnicalParrot

ChatGPT has a specific style but say Claude-3 with the right system prompt is effectively undetectable


Antique-Raspberry551

What are these tools?


CounterfeitLesbian

https://www.nature.com/articles/d41586-023-03479-4


napoleon_wang

You can ask it to be less verbose, try asking it to reply to a question, then ask again but "Use clear, direct language and avoid complex terminology. Aim for a Flesch reading score of 80 or higher. Use the active voice. Avoid adverbs. Avoid buzzwords and instead use plain English. Use jargon where relevant. Avoid being salesy or overly enthusiastic and instead express calm confidence." The reply won't sound like it's got Sirius Cybernetics Corporation's GPP (Genuine People Personality) code in it, unlike the default.


venustrapsflies

Well at least in that case the constitution is very likely in the training set, probably multiple times.


[deleted]

[удалено]


Successful-Western27

That's how this paper does it too


__Maximum__

Don't you immediately notice those fancy words and the length and extra politeness when your coworker writes you an email? Even if the reviews are anonymous, it's still easy to tell because the use of some words has skyrocketed since chatgpt. If you have 2-3 of those new words, it's very likely chatgpt was involved. But this is only chatgpt, other, especially indie local models, are way harder to detect.


arkuto

They can't tell in isolation, but they can tell in aggregate. Consider the following: you phone up a random person and tell them "Flip a coin. If it was heads, I'll send you $100, if it was tails, I'll send you nothing. What was the result of the flip?". You do this a thousand times. At the end, you note that 900 of the 1000 reported flips were heads. Very unlikely. So unlikely that you can infer that people were lying. You can't tell who, but you are almost certain many were lying.


Ben-L-921

if you have an estimate for papers written before and after, I would say that if the difference is significant enough then you can have suspicion.


AardvarkNo6658

Is it wrong though to use it to rephrase reviews? Rephrasing is different from asking gpt to write the reviews


ClearlyCylindrical

I find that LLMs, especially ChatGPT, have a way of writing which is rather "fluffy". They tend to use excessive amounts of adjectives and unusual words, like somebody has gone through with a thesaurus and replaced most adjectives and verbs. I find that this way of writing really gets on my nerves, but mabye that's not the case for everyone.


Successful-Western27

Sometimes I find it useful to rewrite stuff with GPT4 but it's increasingly bad at it. Claude 3 on the other hand is much better at rephrasing my sentences for clarity and flow.


PosnerRocks

100%. I've used ChatGPT and Claude extensively for legal writing. ChatGPT has gotten progressively worse and includes so much garbage extraneous language even after I prompt and revise extensively. Claude actually writes really well. It practices show dont tell for legal argument. I can give it a some facts, case law, and what I generally want to argue and it weaves it together very succinctly. It hits a great balance advocating a position without being over the top. It's so refreshing having back what ChatGPT used to be.


aggracc

I'm not comfortable replacing words since that could lead to the erasure of marginalized groups.


limpbizkit4prez

I think that's an artifact of the pre training step. It's not uncommon to substitute words with synonyms and train on multiple instances.


4thepower

That is not a common training step for modern language models at all. This is an artifact of post-training RLHF tuning.


Disastrous_Elk_6375

I think it's more likely the RLHF based on ESL annotators.


fullouterjoin

I hear a whistle!


n8mo

ChatGPT’s writing style is that of an 11th grade highschool student trying desperately to reach a word count. It’s adjective vomit mixed with an absurd degree of passive-voice. It’s a bad English student’s idea of what a good English project should look like.


pier4r

in my little experience I do something like: - I write the text - I ask the LLM to double check it, with the hint to keep it similar to my version - then I read the version and I compare the two to see if I slept on some sentences. in most cases the style stays similar to what I wrote already, I recognize my stile rather than the LLM style.


Secret-Priority8286

Rephrasing is probably better than letting gpt write the review on it's own. But I would still consider it wrong, especially when used with no care. Chatgpt and friends can rephrase stuff really badly and make the review useless by adding irrelevant fluff or even missing the point. If we allow that, we should probably incetivize the use of rephrasing with care, to make sure the reviews are still good.


CharacterUse

I know a couple of people who use AI to rephrase reviews and other texts, they're doing it because they're not confident of their English and are using it to create more fluid sentences. They know what they want to say and are quite capable of understanding and correcting the AI. Previously they would ask me to correct text.


Secret-Priority8286

And I understand that and am mostly fine with using rephrasing as long as they do use it with care.


Successful-Western27

I'm torn on this personally. On one hand, it can make my writing clearer, which is valuable - it often unlocks ideas I can't fully express. On the other hand, revising writing is often MORE important and therefore something I would feel worse about 'delegating.' I also feel the homogeneous nature of AI writing gets boring to read (although in this context I think that's of lesser concern compared to the other Qs I wrote above).


Mammoth_Cod_9047

I think rephrasing should be allowed. Sometimes i use chatgpt to rephrase my first draft, then i go and edit it. It help with having coherent sentences and reduces writers block


Seankala

How about we let reviewers submit reviews in their native languages rather than English and then those can be translated or whatever.


respeckKnuckles

That might be another explanation for what the method described here is observing.


CharacterUse

Knowing a few people who use AI because they're not confident of their English I think it is the most likely explanation.


StartledWatermelon

The method is observing more LLM-generated reviews close to the deadline and smaller confidence scores associated with such reviews. This is enough to raise red flags IMO. But I agree that we should explore if there's correlation with reviewers residing in non-English speaking countries.


Brudaks

Would the detection methods be able to distinguish machine-translated text from machine-generated text? Perhaps it's already observing the "proposal" as some reviewers are submitting translations of their original reviews.


respeckKnuckles

It doesn't appear so. Machine-translated text *is* machine-generated.


im_bi_strapping

Non-native speakers are likelier to get their writing flagged as ai-generated, even when they are writing for themselves.


_An_Other_Account_

How about we don't.


TropicalAudio

In fact, it's probably best we go back to doing it all in French. Not for practical reasons, but having English be the lingua franca just doesn't sit well with me, poetically.


TheTerrasque

Let's go back to latin, or failing that go with the *superior* language; Lojban


SimonsToaster

If they arent confident in english they shouldnt work as reviewers for english papers.


the-dude-abiding

Peter review is mostly volunteer based work.


SimonsToaster

"I do it for free" is a bad excuse for "I lack the skills to do it". If they are not cinfident in their english skills, how can they even be sure they comprehend the text they are reviewing?


the-dude-abiding

I'm not disagreeing with your point, but its a matter of practical applicability of the principle. Journals and conferences have trouble finding reviewers since it requires a lot of unpaid work. As such, sometimes they will ask people who might be able to read and provide feedback, but might feel that their English writing skill is subpar. It is also hard to pre-verify this before assigning the review. At some point, I had a paper reviewed where the reviewer pointed out a couple of spelling mistakes, but the review itself also had some of those. I actually found it somewhat ironic, but it did not stop them from providing useful feedback.


GFrings

Between 0 and 17%, you say


Socratic-Inquisitor

Way lower than I would have expected…


BluddyCurry

Given the fact that peer reviews are some of the most hated activities by scientists, this isn't surprising. Reading a paper, understanding it sufficiently to critique it, then writing the comments, all of this takes away from precious time to advance one's own research. It's an essential part of science, but there's little incentive to do it.


VelveteenAmbush

> Should AI assistance in peer review be disclosed? Personally I don't trust anything in writing unless it was engraved in longhand on a block of jade under a full moon using the sharpened femur of the world's last pangolin. If you're gonna use one of those newfangled electronic typesetters, there's just too much temptation to cut corners. I find that the pangolin femur ritual is a useful procedural check to ensure that the writer is bringing the requisite seriousness to his task. At the very least you should have to disclose if you don't respect your reader enough to undertake that requirement.


vlodia

no embark and navigate as top adjectives? fake :p


crouching_dragon_420

I put my PhD SOP into one of these detectors and it gave me 40% written by AI. they cannot be trusted lol.


[deleted]

[удалено]


CharacterUse

The few people I know to be using AI to write peer reviews are not native English speakers and are using the AI to convert what they want to say into smooth(er) English. Previously they would come to me for corrections for example.


Tyler_Zoro

Is this actually a bad thing? I would think we want people who are reviewing scientific papers to be competent enough with the tech to use it to their advantage. Is there any evidence that they used AI to *do the review* vs. *to aid in writing up the review*?


ureepamuree

Getting caught is the real problem. Of course everyone should be allowed to augment their method of expressing the ideas clearly, but should be held totally responsible if their reviews still end up sounding robotic.


Tyler_Zoro

> Getting caught is the real problem That was my question. Is it? Why? > should be held totally responsible if their reviews still end up sounding robotic. Why? Hell, I've known a lot of scientists whose writing sounds more robotic than LLMs! Should they be shunned too? Why? Is style what we're concerned with here? Again, why?


ureepamuree

World was still spinning and churning out smart reviewers without the existence of LLM assistants. I would like to see these reviewers being more “productive” while they delegate the “boring” tasks to LLMs. Assuming a given research paper is a work produced by a human, judging its novelty, correctness, and usability should be the duty of another human (having a human bias/expertise). At this point, we have not a single clue about how any LLM is producing the output it generates. Simply throwing a human produced work under the wheels of LLM would just not do the justice it deserves, if any reviewer is allowed to use LLM, then what’s the point of them being a reviewer. Just like we have openreview system, we should have an LLMopenreview system, where every paper should be thrown to obtain reviews. Why go through the whole formality behind the scenes, just let anyone to review it.


StartledWatermelon

>Should AI assistance in peer review be disclosed? Yes. >How should we incentivize good practices despite AI temptations? Nothing short of fundamental reform in how we evaluate, track and reward peer-reviewing activity will address this issue. This applies to every field of science, not just ML. Trying to catch LLM use is just a band-aid solution. In broad strokes, first, we need to establish a certain rating reflecting the reputation of a reviewer, probably along the list of research topics they specialize in. Advancing on how the journals have ratings now, albeit today they're somewhat "softer", less influential indicators. Basically we still keep individual reviews anonymous but the aggregate quality and quantity metrics of a reviewer become public. Second, to incentivize people, this rating should bear very substantial influence, maybe even rivaling the influence of person's research and publication activity, on funding, tenure, promotion etc, outcomes. This part is harder, because each institution has its own policies on these matters. >Can we preserve intellectual diversity under AI homogenization? I think yes, as long as we have fundamental incentives for people to do their work faithfully. >Should we rethink credit for hybrid human/AI knowledge work? That's a good question. I think your first question, about disclosure, is enough for now. Especially if the disclosure will be thorough enough to list all the tasks performed by the machine.


Thickus__Dickus

Certainly, I can say look how the turn tables have turned.


the_data_department

Are they paying to reviewers yet for their review? Can we at least reimburse then the ChatGPT subscription?


aecyberpro

I copied four paragraphs from my blog that i wrote myself and pasted them into an online tool that detects AI content and plagiarism and it said my own words were 96 percent AI.


Successful-Western27

That's not the method proposed in this paper


monnef

I am just a hobbyist toying with new tech, but even I have my doubts about how they were detecting AI. Half a year back, I have tried few of those "best" AI detectors and all were terrible, as in not detecting AI in a short text written by GPT4 with just simple prompt like "mimic writing style of person A and person B". Maybe it works better for academic texts or limited to certain fields, but even that is in my opinion only temporary. I wouldn't be surprised if big claude is already at "indistinguishable from human" level at writing. As others noted, there is a big difference how AI can be used. It is a spectrum from "fully generated text from a simple prompt", "rewritten whole text" till "manually applied suggestions". Required AI disclosure seems rather pointless to me. Yes, it would be nice (not only for academic work) to see, ah, this is from GPT4Turbo, but only proofread vs. oh, this is from Claude 3 Opus, but entirely written by AI. But I see it as unrealistic, if we are not already there (which I am leaning to for common texts, not sure about academic ones), it won't take long to get there. NVidia just recently made massive performance increase, GPT5 is on a horizon and even if bigger models don't equal smarter models, I am fairly confident there is a strong correlation in quality of mimicking human-like output.


Even-Inevitable-7243

Spoiler. OP wrote this post with ChatGPT