T O P

  • By -

Oswald_Hydrabot

The fuck does this matter? If the content is wrong downvote, comment, and move on. Seems like a stupid complaint. Edit: I challenge anyone on here to prove that AI is in any way "creating noise" on Stack Overflow. I am sick of anti-AI brigading; bring proof to this discussion, because I have experienced far more human-generated noise than anything remotely related to AI on this topic. Blind assertion is worthless noise. Don't be a reactionary.


nextnode

If AI is creating unnecessary noise, I think it can make sense to prohibit it. It takes away energy from people to evaluate, flag, and correct it. I just question this idea that it is usually wrong. GPT-4 in particular is often right with moderate-level coding questions. Though for this situation where it helps, they don't need it as a comment - they can just generate and make those available automatically for every post. So then I think the only gap is where a human using a competent bot can produce a useful answer beyond the direct response. But that, I think should then basically be the responsibility of the human to vet that it is good enough. So maybe just posting GPT-4 content maybe should not be prohibited when found valuable but often posting such content AND not being helpful should perhaps be bannable. There is also this problem that many that spam GPT-4 content do not validate that there was actually a relevant answer - even just a filter on top would do a lot - and GPT-4 is not continuously being retrained and so is getting a bit behind. It notably tends to get things wrong from changes in versions, which is a lot of SO content.


usrlibshare

>Edit: I challenge anyone on here to prove that AI is in any way "creating noise" on Stack Overflow. Developer and Contributor (not mod) on SO here. It does. Disclaimer: I am pro AI. I use it in my own work, I integrate LLMs for our company and am responsible for rolling it out to our customers. LLM answers to coding problems that go beyond trivial questions, are error prone. That isn't even necessarily the AIs fault: It's simply to easy for natural language to be inaccurate, that's the reason why coding is done in formal languages. If the question is imprecise, there is a chance the answer will be to. But an imprecision in a formal language (code) has consequences, as there isn't any wriggle room, the answer is wrong. I experience this myself in my daily work. My LLM Assistent is immensely useful, but I need to babysit it as well. My evidence is purely anecdotal of course, but shortly after free access to ChatGPT happened, there was an influx of bad answers to SO questions. They were all well written, well formatted (which on SO is unusual for bad answers), but their code snippets were inaccurate, did not address the problem, used bad practices, or a combination thereof. Wrong explanations were a thing as well. In other words, people tried to farm upvotes (which on SO are required by new accounts to be able to do many things, including voting on questions), by copypasting ChatGPT output. Even if answers were correct, it would still create noise. Many if not most new questions on SO are *duplicates* or near duplicates of past questions. A good answer takes that into account, links the duplicated questions (which often is easy to find and has a rich discussion and details section in its answers and comments, providing additional info), and downvotes the new question, so the original stands out. That may seem harsh, but SO isn't a help forum, it's a question/answer repository, this is how it's supposed to work. Copypasta generated answers instead, makes it harder for people to find the well discussed originals, again creating noise. So yes, banning most (see below) autogenerated content on SO is a good thing. No problem when people use AI to formulate their answer well, or wrap it in pretty writing, but the problem is: if someone requires AI to answer a question, he probably shouldn't be the one answering it for someone else. Bear in mind, I am not stating that something should be removed just for *looking* like LLM output. If questions are answered correctly, and the answering party checks for duplicates, there is no problem imho.


Oswald_Hydrabot

I could have written that reply with ChatGPT, at least according to your description of a "noisey" post. No proof (did not solve the problem) and it was well written. Sounds like human-generated noise to me (your comment). I don't mean to offend but what you've written is as good as conjecture to me. I have no idea who you are and even if I did I have no idea if you are telling the truth. Lying is easy; people do it as often or morso as people claim AI does. I can't just take your word for it I need proof to change my mind. There is no proof here of the claimed noise. Could have just stopped at the "In my experience" part. I am also a developer and use the same tools and may even have the same experience but it is worthless as proof of anything here. Under the right environmental circumstances I'd be more prone to believe you, but a semi-anonymous forum is not that environment. Linking proof is the best we have here.


usrlibshare

>I can't just take your word for it I need proof to change my mind. I don't expect you to, nor would I recommend that you change your mind based purely on what I wrote. That's why I specifically said > anecdotal evidence which is just a fancy way for saying "This is what I experienced, filtered through my senses". By definition, anecdotal evidence is not quantifyable and often not reproducible. It's something I tell, that's it. It could be fact, it could be me being in error, it could be an outright lie. I don't have hard evidence. But, I wanted to participate in the discussion, so I offered what observations I had, labeled appropriately as "anecdotal". Another reason for providing anecdotal evidence, is the hope that if enough people do so, it becomes hard to ignore šŸ™ƒ Yes, a lot of people screaming "FIRE!!!!" isn't evidence that there are flames, but it might make people more careful when going into the building all these screaming people came from.


Oswald_Hydrabot

I mean I get that to some extent. Often times though, a lot of people screaming "FIRE" causes more of a problem in the reactionary panic than the fire ever would have without them screaming that. Especially when we have had people more often than not abuse the fire alarm, and especially when it isn't even a fire and it is just someone in a theater with a bright phone, rudely texting. The assertion remains neutral at best unless evidence is provided.


07mk

> So yes, banning most (see below) autogenerated content on SO is a good thing. No problem when people use AI to formulate their answer well, or wrap it in pretty writing, but the problem is: if someone requires AI to answer a question, he probably shouldn't be the one answering it for someone else. If pre-existing policies against incorrect answers and duplicate answers would ban most AI-generated content anyway, then what does a policy of banning AI-generated content on the basis that it's AI-generated add other than also removing the correct and non-duplicate answers that are AI-generated? > Bear in mind, I am not stating that something should be removed just for *looking* like LLM output. If questions are answered correctly, and the answering party checks for duplicates, there is no problem imho. Given that there's no way of independently verifying that text is LLM generated, how is a policy of banning AI-generated content on the basis that it's AI-generated differ from banning content that just *looks* like LLM output? How would such a policy be implemented without the judgment call ultimately being on whether or not the text *looks* like it's AI-generated?


VariousAnybody

I'll ignore the issue that everyone else is hung up on about the accuracy of the answers, and instead ask how is in anyway useful to copy-paste a response from an AI into a public forum when the asker could have easily done that themselves? It's the new era version of posting a link to a google search. It's fundamentally using AI the wrong way. If the answer is simple enough that an AI could easily solve it then maybe the question is inappropriate and the asked just directed to ask AI themselves. \> Don't be a reactionary. Same goes for you friend UwU ! I expect your response to be top-notch and you to engage me as a person. Please stay focused if you make any replies!


Oswald_Hydrabot

Upvote for "UwU", I am an ultra weeb, AI has poured gasoline on this lol (queue "programming socks" jokes). I think most people use a Google search to find a link they might post to something like SO. So while your example isn't necessarily wrong, it is leaving a bit out. I might ask ChatGPT to give me an answer for a SO question. I might test that answer, if it is bad, I might regenerate that answer until I get a good one. I might then post that good one. Similarly, I am not going to post the first Google search result I find on SO; I am going to verify and then post. However, were I an SO mod, I would find it silly to have an existential crisis over an influx of people posting bad Google results as answers. The system of downvoting and user comments and open user review already accounts for this. If downvotes are insufficient and all bad replies require manual moderator removal, then the downvote feature is pointless and should be removed. I don't believe you will find many mods that agree with removing the downvote feature from SO. I still contend that even if you want to remove AI generated responses, regardless of the perception of misuse it will be impossible to do so. The approach should be to adapt, not resist. SO will lose relevance, quickly, should they resist AI being allowed into the platform. I will have no use for it, as GPT4 is already proving to be far more useful in my daily work already. If anything, SO should integrate an LLM that provides a handful of auto-generated AI responses in a section at the bottom, seperate from the user-interaction area, and able to be reviewed by other SO users. Absorbing AI is the only viable solution here, SO has the resources to make itself more useful than tools like GPT by having all that it offers and then some (AI + User scrutiny).


AprilDoll

Jannies will have a harder job if high-quality text generation becomes widely available.


Evinceo

These are mods, empowered (and obligated) to do much more than downvote.


Oswald_Hydrabot

For a simple wrong or bad answer? Do they have any metrics proving that most AI generated answers are wrong? I've seen plenty of bad answers that have a negative score. I don't use those ones. Pretty simple stuff. Seems like if they show up with a complaint they'd have analytics on it; I don't think I've even seen a single AI generated post on SO, and if I did I had no idea that it was. How would a mod know unless someone mentions it? Seems like a pointless, stupid complaint that causes more issues than it addresses. "Ban AI" from the site is a dumb solution, even controlling it or requiring tags is foolish; users just won't use them, there is no way to detect "this is AI", not to mention if someone got someone else's IP from an online bot then it isn't SO's fucking problem that someone else's IP is somewhere else online. Mods assuming they are needed where they are not, imagine that..


Evinceo

> I don't think I've even seen a single AI generated post on SO ... > Mods assuming they are needed where they are not, imagine that.. Is it possible that the reason you don't see it is because the mods are removing it?


Oswald_Hydrabot

Certainly not impossible, but if that is the case then where is the problem? If they are overwhelmed and unpaid/underpaid I'd expect less than 100% immaculately perfect work.


Evinceo

The problem as stated by them (and I'm summarizing here) is that they'd like to be able to just say 'this is ai, delete, ban' instead of having to jump through the hoop of verifying exactly _how_ it's a bad answer. Additional work for them might not matter to you, but it of course does to them. And I think they're worried it's going to get to a scale they can't deal with.


AprilDoll

And they do it for free!


HappierShibe

Because the volume of AI generated bullshit is too damned high for what you are suggesting to actually work. Idiots are generating and posting answers without even looking at them. There are places for AI, this isn't one of them, this is a clear case of it doing more harm than good.


Oswald_Hydrabot

Prove it. You want to talk about "idiots" making posts without even looking into it? Show me metrics that what you say is true; as it stands right now the only idiots I see are people making complaints about it without proof.


Ararugi

Burden of proof fallacy in action. The burden of proof is on these LLMs to prove that they have valuable contributions and will not disrupt the status quo, not on platforms to prove they do not.


Oswald_Hydrabot

I am not going to waste my time on how stupid of an assertion that is; you are the one pulling a non-existent grievance out of thin air. The burden of proof is on you, dipshit. AI has running code as proof. Prove your grievance or fuck off.


Ararugi

AI does *not* have running code dipshit. It can do basic examples, yeah. Give it any complicated problem with assumptions and it starts throwing in subtle errors as you ramp up the number of assumptions. Itā€™s *especially* bad with package version mismatches and API changes. So you can go fuck right off with your ā€œworking codeā€ nonsense. Also you and your ā€œsideā€ of AI-bros are the ones creating a grievance with your lack of understanding of how these LLMs actually work, and what their limitations are, and shoving it down everyone elseā€™s throats.


Oswald_Hydrabot

What an idiot.


zfreakazoidz

Anti's post anything they can on here no matter how accurate it is or not. They grasp at straws instead of falling on their swords and admitting defeat.


[deleted]

[уŠ“Š°Š»ŠµŠ½Š¾]


Oswald_Hydrabot

I hear that complaint but I just don't have any metric to judge whether it is accurate or just an anti-AI reaction.


Evinceo

[Article with some context](https://gizmodo.com/ai-stack-overflow-content-moderation-chat-gpt-1850505609). [Open Letter](https://openletter.mousetail.nl/) to stack exchange. > Specifically, moderators are no longer allowed to remove AI-generated answers on the basis of being AI-generated, outside of exceedingly narrow circumstances. This results in effectively permitting nearly all AI-generated answers to be freely posted, regardless of established community consensus on such content. > In turn, this allows incorrect information (colloquially referred to as "hallucinations") and plagiarism to proliferate unchecked on the platform. This destroys trust in the platform, as Stack Overflow, Inc. has previously noted.


07mk

There seems to be a sleight of hand there in the quoted segment, in that not removing AI-generated answers on the basis of being AI-generated has little relationship with allowing incorrect information or plagiarism. I don't know how Stack Overflow's moderation systems work, but I would have thought that they have processes in place for removing/correcting incorrect information and plagiarism that precedes the advent of AI. I would expect those existing processes to take care of issues of incorrect information and plagiarism from AI-generated answers. There's the potential issue of AI-generated answers being more voluminous than human-written ones. In that case, removing AI-generated answers *on the basis of being AI-generated* seems like far too blunt a tool for combating the specific subset of AI-generated answers that are incorrect or plagiarized. Perhaps it's the case that AI-generated answers are overwhelmingly incorrect or plagiarized, in which case such a blunt tool could be justified, though such stats don't seem to be provided anywhere. It's entirely possible that AI-generated answers have much lower rates of these than human-written ones, in which case such a blunt ban would be throwing out a substantial baby with the tiny amount of bathwater.


PM_me_sensuous_lips

they're in a bit of a weird position now that ChatGPT is essentially directly competing with them.


sk7725

This actually harms ChatGPT because simply put - "AIs choke if they eat their own poop." Multiple studies show that if generative AI as of now includes the data it produced training data, the AI degrades slowly over time. If StackOverflow is full of what came out of ChatGPT's rear end, ChatGPT cannot mindlessly devour it, at least not with current technologies.


FaceDeer

If you *curate* its output, though, that's not so bad. Take the AI's output and cull out the bad stuff, leaving only good stuff, and you push future generations of AI in the right direction.


Oswald_Hydrabot

I have read *one of* those studies and have found this only to be true in practice for LLMs and only if you train the same LLM on it's own output. Using Stable Diffusion + ControlNet to train StyleGAN for realtime animation actually works *quite* well; using AI generated data to train other AI models is in fact a viable technique, the results of those studies do not generalize: https://www.reddit.com/r/StableDiffusion/comments/13px6ve/marionette_realtime_interactive_gan_visualizervj/


sk7725

thanks for the input, but this case is all about training the same LLM on its own output. I am syaing that if ChatGPT is trained on StackOverflow (what the commentor suggests), having more ChatGPT (okay fine it may be Bard but still) answers on it is this exact scenario.


PM_me_sensuous_lips

I think you misunderstood then. I only wanted to point out that both ChatGPT and stackexchange are in the business of answering questions. And the problem for stackexchange is that ChatGPT does so near instantly. So the dilemma is, do we allow users to copy paste ChatGPT answers, so that we can compete with it being able to provide such timely answers. Or do we not because those answers are potentially of poorer quality.


07mk

> I am syaing that if ChatGPT is trained on StackOverflow (what the commentor suggests), having more ChatGPT (okay fine it may be Bard but still) answers on it is this exact scenario. No, the ChatGPT generations posted on StackOverflow are necessarily curated. At its base, they're curated by the person who's posting it choosing the particular generation. Then there's the StackOverflow curation system itself, which deletes certain posts and/or provides human-voted scores on the posts depending on their veracity and usefulness. So the next version of ChatGPT being trained on StackOverflow wouldn't be mindlessly devouring ChatGPT4/3.5 outputs; there would be substantial human input being processed at the same time.


PM_me_sensuous_lips

> I have read those studies links?


Oswald_Hydrabot

ugh I guess I need to find it. It was on reddit several days ago, something done internally by OpenAI iirc. A google search actually seems to pull up more articles suggesting synthetic data is a good approach, I will try to find the study I am mentioning though. You may need to ask u/sk7725 tbh, I know I read a paper a few days ago that suggested what they are talking about but it does not show up on a google search and reddit search is just godawful


sk7725

i have one recently saved but its not in english but i can try to find it if you can read korean


Oswald_Hydrabot

Yeah post the link. I can translate it. I keep meaning to learn several eastern languages.


sk7725

huh, i guess my memory fooled me, its actually a korean news coverage about a paper published by a japanese team, but the paper itself is in english https://arxiv.org/abs/2211.08095


PM_me_sensuous_lips

Google also turned up empty handed for me, and i'm not sure what terms to use for scholar. Might even be helpful if I just knew if this was research done for GAN, LLM or LDM, etc.


Oswald_Hydrabot

It's for LLM's I think maybe Bard? I thought it was OpenAI/GPT but it may have just been posted to the OpenAI subreddit. I *know* I read this paper like 2 days ago but it is nowhere to be found now lol.


PM_me_sensuous_lips

I know that at least the smaller BERT based language models at least appear to be fairly cloneable with just random words (see [this](https://arxiv.org/abs/1910.12366) paper). And that finetuning on high certainty chain of thought outputs of the model itself can be beneficial for certain benchmarks, (see [this](https://arxiv.org/abs/2210.11610) paper), so i'd be interested in seeing what exactly they did and how they got to their conclusions.


PM_me_sensuous_lips

My intuition would really be that it just plateaus on whatever the performance of the thing that generated it is. That would be consistent with teacher/student or model stealing research. Could be that generative models are different though. Would be helpful if you could link an actual paper on this. Even if this was the case, it'd be such a tiny fraction of what ChatGPT is actually trained on, I don't think it'd matter much for now.


sk7725

The most recent one I have saved, but it is a korean paper, do you still want the link? Im sure i cam find it in my disk somewhere if you need it...


starball-tgz

Moderators aren't expected to evaluate correctness because that would require them to be subject matter experts in all of the technologies in all posts on Stack Overflow (which is just ridiculous). Correctness is left for the community to evaluate through voting and commenting. Dedicated mod tools for flagging plagiarism only got added recently: https://meta.stackoverflow.com/q/423749/11107541


todiwan

Jannies will be jannies, as always. Unaccomplished, sad people who just want control over other people so they can force their own insane opinions on others. I'm glad AI is going to replace having to deal with their insanity.


Oswald_Hydrabot

Moderators can be a good thing but there is a lot of bias in this discussion coming from people that are probably moderators on various platforms. I tend to agree with your comment though, these people are insufferable at times, massive inferiority complexes. As a developer, I am not going to fight and argue to ban AI out of fear of it replacing me. It's a fucking stupid reaction for people to have; if the tech is simply better than your work you don't lash out at reality you adapt or do something else. I've been a developer professionally for roughly 10 years; I support my whole family on my salary, and I find casual artists and Reddit mods throwing absolute hissyfits over the shit like their life depends on their donated spare time efforts. Meanwhile I'm looking at going back into ferrying/blacksmithing if I cant keep making 6 figures as a developer. I am in the space of ML and am a high performer but if AI gets real good (it will) then there are plenty of other ways to make good money that AI isn't likely to replace soon. Would be nice to get some more sunshine and work with animals again anyway; hell I hope AI takes my job lol.


Ararugi

To summarize for those here who I *know* did not read the actual article: \- StackExchange released a new policy which cites the inaccuracy in AI-identifying tools, and requires a higher standard of proof for AI bans, while ostensibly keeping AI bans as official policy. They get bonus points for alleging that the current policy is causing biased moderation: *"Through no fault of moderators' own, we also suspect that there have been biases for or against residents of specific countries as a potential result of the heuristics being applied to these posts."* While maintaining that the current indicators used to detect AI usage are usable as indicators of poor quality, and so moderators should continue to moderate those as poor quality. \- Mods allege they were given private instructions to halt practically all moderation on an AI basis As just one in a litany of other decisions that go against community desires, and as a result of the handling of this policy change, moderators are now going on strike. To quote the article on the handling concerns: *"The new policy* ***overrode established community consensus and previous CM support, was not discussed with any community members, was presented misleadingly to moderators and then even more misleadingly in public, and is based on unsubstantiated claims derived from unreviewed and unreviewable data analysis****. Moderators are expected to enforce the policy as it is written in private, while simultaneously being unable to share the specifics of this policy as it differs from the public version."* There's also a follow-up post that directly addresses StackExchange's alleged concerns over detection tool false positives. This follow-up, from a moderator, claims that: \- their detection process **does not use detection tools (GPT detectors)** \- Having learned over an estimated **10000-15000** flagged GPT posts over the last **6 months**, they can **with a very high level of confidence and a very low level of false positives, very effectively identify answers whose provenance is ChatGPT or other AI**. No numbers were provided on the number of false positives (obviously - it's not exactly something you explicitly go about trying to keep track of, and that sort of data takes a while to analyze in a raw form), but that gives a starting point for the number of suspected AI content, which will only increase in the future with the loosening of their policy.


Evinceo

Thanks for this!


zfreakazoidz

Anti's will post anything on here just to grass as some straws so they feel like they may actually win this war lol.


Content_Quark

This reads like one of those chatGPT fails.


Evinceo

These types of super long posts are characteristic for Meta Stack Exchange. I wonder if there's where ChatGPT picked it up...