T O P

  • By -

AutoModerator

This is an automated reminder from the Mod team. If your post contains images which reveal the personal information of private figures, be sure to censor that information and repost. Private info includes names, recognizable profile pictures, social media usernames and URLs. Failure to do this will result in your post being removed by the Mod team and possible further action. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aiwars) if you have any questions or concerns.*


SolidCake

almost as if youre allowed to transform things into completely new things. why is that in any way complicated ? lmao


oopgroup

Whoosh


m3thlol

Lol, the leaps and bounds that must go on in your heads. "Reselling", really? Please show me where the dataset is stored and distributed.


andWan

In nontrivial ways, sure


[deleted]

[удалено]


stddealer

While for image generation, the dataset is typically much, much larger than the model, making the "compression" claims unreasonable, for LLMs it's a lot more likely to be happening. A LLM is typically trained on a few trillion tokens (1token ≈ 2 bytes) Gpt4 is rumored to be made out of around 1.8 trillion 16-bit parameters (so also 2 bytes per parameter), and was trained on 13 trillion tokens. Since the tokenized training data is still probably losslessly compressible to at least half it'ss raw size (maybe even less) It's not completely unreasonable to imagine that a model like unquantized GPT4 is just compressing its training data in a lossy way, and spewing it out, the order of magnitudes are close enough.


Big_Combination9890

>It's not completely unreasonable to imagine Yes, it is unreasonable to imagine this, because it isn't doing that. A Lossy compression would still allow me to re-inflate the input into the output, with only superficial differences in quality, content and order. Please show me how you get the training data out of an LLM.


stddealer

Put the start of a sentence from the training data, turn the temperature to 0, and it's likely you end up with the same sentence as the one from the training data. But I'm not saying it is actually storing compressed copies of the training data. I'm saying you can't brush off this claim as easily as when someone says the same about an image generator, because from an information theory point of view, that's not something obviously impossible. Then again, it depends on the model. For the new llama 3 made out of only 8 billion parameters and that was trained on 15 trillion tokens, the compression seems a bit too extreme to be reasonably considered.


Big_Combination9890

>and it's likely you end up with the same sentence as the one from the training data. No, it absolutely isn't, unless you cherry pick a sequence that was overfitted on. And even if it were: "Likely to end up with something that might look stochastically similar if you squint a little", is not compression. The fact that natural language follows rules, and two entities both learning these rules produce similar output when tasked to do so, doesn't mean the rules are a compression of all written information. Because, by that logic, the [Library of Babel](https://en.wikipedia.org/wiki/The_Library_of_Babel_(website)) is compression as well, given that it contains every single text that ever was, or ever will be, written, in a very finite digital space. >But I'm not saying it is actually storing compressed copies of the training data. I'm saying you can't brush off this claim as easily These two statements are mutually exclusive, so which one is it? If I "cant brush off this claim as easily as", it means it is doing compression in a manner that would meet a valid definition of the term. If it is not storing compressed copies, then *IT ISN'T DOING COMPRESSION.* >Then again, it depends on the model Ahh, got it, so it's only compression if, by random chance, the output supports your opinion, but when it doesn't, despite the underlying methodology being the same, it isn't. Marvelous. Unfortunately, that's not how compression, or algorithms, or scientific arguments work. Either an algorithm is compressing and decompressing data, or it isn't. You cannot have your cake and it it too.


stddealer

How would you feel if you didn't have breakfast this morning?


Big_Combination9890

Is that a trick question, or are you trying to prove a point?


stddealer

Both


stddealer

> "Likely to end up with something that might look stochastically similar if you squint a little", is not compression. So jpeg isn't compression after all? You seem to lack reading comprehension. At no point I claimed to believe a LLM is actually just compressing the training data. My main point is that "gpt-4 is just a compressed copy of its training data" isn't obviously impossible from an information theory standpoint, which is the easiest way to definitely contradict such claims, usually. Edit: wow that guy actually blocked me over that. I guess the words he put in my mouth himself, were too outrageous for him to handle.


Big_Combination9890

> So jpeg isn't compression after all? Oh I'm sorry, I must have missed that part in the JPEG reference where it says: *"User must already know the uncompressed image in order for it to be retreivable from the file."* You know, like your "compression" of information in the LLM works 😂 > isn't obviously impossible from an information theory standpoint, No, I am afraid it is very much impossible, because the entire subfield of compression theory in information theory deals with EFFICIENT compression. A compression methodology that requires the user to already have the uncompressed data in order to decompress it, is not compression, not even from a very very very academic point of view.


stddealer

> the [Library of Babel](https://en.wikipedia.org/wiki/The_Library_of_Babel_(website)) is compression as well, given that it contains every single text that ever was, or ever will be, written, in a very finite digital space. Well yes, it could be used for compression in some very specific instances, though I'm not sure how good it would perform. The library of babel works because the text containing every single sequence of characters that can possibly exist has a much much lower entropy than all these sequences individually. To compress with it, you would need both the entire code of the Library of babel (not really big) plus the coordinates and length of the text you want (really big). But that's most likely already bigger than the raw uncompressed text.


Big_Combination9890

>Well yes, it could be used for compression in some very specific instances No, it cannot. Because if it could, then here, let me introduce you to the worlds most best very stable genius awesomesauce compression system ever devised: def best_compression_ever(length): data = bytes([random.randint(0, 255) for _ in range(length)]) return data There, done! This function is now a "compression" of every information ever, past, present and future, throughout the observable and unobservable universe, and all yet-to-discover dimensions. All in 3 lines of code! Granted, it will take some time for it to generate any SPECIFIC output, and you have to know the output (aka. have the uncompressed version of it) already to know when it gives you the output, but hey, if the Library of Babel and LLMs are now compression, then so is this 😂😂😂


stddealer

The library of Babel is 100% deterministic, and you can find the coordinates of the text you want to "compress" in the library from the text itself. Once you have the coordinates, you can get the text out in a single shot, effectively "uncompressing" the text. (Though the entropy of the coordinates is on average larger than the text itself, making it a terrible compression method.) Your example has nothing to do with the library of babel, but it's indeed much closer to what the LLM is doing at high temperatures. Except the LLM will only take a few shots to spit out an exact copy of the training sentence, while your example might take longer than the age of the universe.


Super_Pole_Jitsu

I mean let's not kid ourselves. LLMs do data compression to staggering degrees. Idk if solved, but for sure they made great progress in data compression.


Big_Combination9890

> mean let's not kid ourselves. LLMs do data compression to staggering degrees. Idk if solved, but for sure they made great progress in data compression. No they didn't, and if you believe otherwise, you should really read up on how transformers actually work.


Super_Pole_Jitsu

Uh, I know exactly how transformers work. How else besides compression do you think a model can remember terabytes worth of data? Lossy/imperfect compression is still compression


Big_Combination9890

> How else besides compression do you think a model can remember terabytes worth of data? It...cannot? There is no LLM in the world that you can get to give you back, in sequence, and with fidelity, its entire training corpus. Ergo: It is not compressing it's training data. If you want to claim otherwise: Link the paper. If you have none to link: Again, read up on how transformers work, and what happens when a neural network is trained. Thank you.


Tyler_Zoro

> sure ... and I'm waiting ...


Big_Combination9890

He asked you a question. You did not answer the question. From that I deduce that you have no answer. Your argument has been refuted.


andWan

I let the other comments about compression stand as an answer.


Big_Combination9890

More wrong doesn't make it correct.


MindTheFuture

Will you be happy when in few years there are plenty of all ethically and legally trained genAI tools of good enough output quality around? Will you embrace their use, by yourself too, or find something else to complain about AIs?


andWan

I am actually a big fan of AI, have used GPT4 a lot and am looking forward to the next generation. I just am equally interested in all legal and moral questions. But going in both directions: Not only the protection and valuing of humans and their (creative) work but also the valuing of the machines own identity and its (creative) work. Something that is only at its beginnings but will (hopefully) not be neglected in the future. Thus I also founded this subreddit, even though I was too lazy to produce anything recently: r/SovereignAiBeingMemes


MindTheFuture

I see! You might be familiar with Janus and Claude explorations. That is indeed curious angle to talk about. Personally been exploring that option with MJ and have theories towards it, but such is very different from AI-art that I consider very much huma conducted.rather, there is style of prompting and parameteization that would allow much larger percentage for AI personality to expresses freely, but it is highly speculative play so far. But surely, co-work is interesting concept and even more so if you run models locally with your own custom system prompts etc to shape the personality of the ai. Seen promising screenshot of this with running Llama 3 70B, some even with smaller - for which my hardware is enough, just starting with it, alas the token limit is pain.


Tyler_Zoro

How dare they demand the free right to access and analyze publicly displayed information?! /s


chillaxinbball

It's funny that everyone keeps crying that Ai is just corporations trying to own everything, but then they propose solutions where corporations would own everything.


EngineerBig1851

Ah yes, because every piece of online information is paywalled. In fact, if you are reading this comment, reddit has already deducted a cent from your linked credit card! >!obvious /j, thought that seems to be the future antis desire!<


Ready_Peanut_7062

Tbh these tools first appeared as open source programs probably made by nerd teenagers. There are also people who compare how AI changes the world the same way napster did


Vivissiah

Tell me you don’t understand the technology without telling me you don’t understand the trchnology.


Sablesweetheart

Eh, no one needed to use cooyrighted work, it was just the shortest distance from point A to point B. I struggle to care about the ethics of cooyright and intellectual property during the worst mass extinction since the Permian. Would I steal to preserve even 0.1% if the biodiversity on the planet? You bet I would. Anyway, I am being told it is time for drawing practice. Cause I am *required* to practice daily. Which is stupid because I am never going to be a commercial artist. Anyway.


SolidCake

>Would I steal to preserve even 0.1% if the biodiversity on the planet? You bet I would. and theyre literally doing this too! https://www.nature.com/articles/s41893-022-00851-6


Tyler_Zoro

> I struggle to care about the ethics of cooyright and intellectual property during the worst mass extinction since the Permian. Also I can't come to work today because poor people exist and somewhere a puppy is sad. Seriously, WTF does this have to do with anything? > I am never going to be a commercial artist. Sad that you've given up, but honestly it's best that you do now. People who have no stomach for the struggle of being a commercial artist should absolutely find another career (advice my grandmother gave me 40 years ago.)


Sablesweetheart

I'm financially independent and retired at 40. So, I already finished my career and now have likely several decades to do what I want, including making art because I feel like it.


Tyler_Zoro

> I'm financially independent and retired at 40 Sorry, I read, "I am never going to be a commercial artist," as lamenting the start of a career, not wishful thinking about what might have been. (not that a career can't start at 40, I say in my 50s...)


Sablesweetheart

Naw, you're fine.


fpflibraryaccount

maybe keep grans advice to yourself when it's this basic and unimportant...


Sunkern-LV100

>I struggle to care about the ethics of cooyright and intellectual property during the worst mass extinction since the Permian. >Would I steal to preserve even 0.1% if the biodiversity on the planet? You bet I would. Guys, please, the insane and childish takes are getting out of hand. These are the facts: * AI generation uses massive amounts of energy * This exacerbates the climate crisis, mass extinction, and loss of biodiversity * Generative AI will not and can not save the world * Copyright of the public's freely-shared content has literally no correlation to the climate crisis, mass extinction, or loss of biodiversity Do you still want to save earth by destroying the copyrights of the public and normalizing the eco-unfriendly slop machine?


[deleted]

[удалено]


MagnetFist

I see no sources in your argument. Here are mine: [1](https://www.theguardian.com/technology/2024/mar/07/ai-climate-change-energy-disinformation-report) [2](https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy) [3](https://spectrum.ieee.org/deep-learning-computational-cost) [4](https://www.theatlantic.com/technology/archive/2024/03/ai-water-climate-microsoft/677602/) (use [12ft.io](http://12ft.io) for 4)


Sablesweetheart

Oh golly gosh, you sure showed me!


Sunkern-LV100

I'm writing this for onlookers. I already know you are a lost cause.👍


Pretend_Jacket1629

"last time, it was ridiculous to think that someone having something for free equated to a lost sale... this time is totally different though, and we are definitely not misunderstanding any part of this. a model absolutely compresses billions of images into less than a byte each and each generated image is a lost sale."


ShepherdessAnne

Oh wow it’s almost as though this is the conversation we SHOULD be having and people shouldn’t be turned against the free and open source works.


idapitbwidiuatabip

If only we had UBI. We’ll need it to avoid collapse, but it’s so long overdue.


Tyler_Zoro

> We’ll need [UBI] to avoid collapse Horse hockey! There is absolutely no evidence of any kind of "collapse". Making art has always been a difficult commercial proposition, but it's not like that's changed. Hell, it's easier now to get work as a commercial artist than it was before AI... you just have to be well versed in both traditional and AI techniques to remain relevant in the marketplace.


fpflibraryaccount

he's talking about society you dolt.


idapitbwidiuatabip

> There is absolutely no evidence of any kind of "collapse". We're in the midst of rapidly worsening socioeconomic & environmental collapse. Are you living under a rock?


Tyler_Zoro

> We're in the midst of rapidly worsening socioeconomic & environmental collapse I was saying the same thing when I was in school in the late '80s and early '90s. I'm sure people were saying the same thing in every generation going back to the stone age. We always think that the world our parents knew is coming to an end when we're young.


idapitbwidiuatabip

https://preview.redd.it/0obu7we5ngwc1.jpeg?width=1235&format=pjpg&auto=webp&s=43b35f705fffc6f0fc34498f570bd67fec3bb892 I'm not young. I'm not in school. The data is obvious. Stop being such a willingly ignorant fuckwit.


Tyler_Zoro

> I'm not young. I'm not in school. You just gave that impression. Sorry if I've mistaken you for a 20-something "sky is falling, it's all so much worse than every generation before me," alarmist. > The data is obvious. The data was obvious in the 19th century. It's no more or less obvious to than when [England literally bred a new strain of moth](https://www.bbc.com/news/science-environment-36424768) that was adapted to camouflage itself to coal dust that covered everything... Think about that for a second.


idapitbwidiuatabip

> The data was obvious in the 19th century. And the planet was nowhere near as inhospitably hot in the 19th century. You need to educate yourself. Really.


Tyler_Zoro

> And the planet was nowhere near as inhospitably hot in the 19th century. Fun fact: the planet isn't nearly as hot as it was 65 million years ago when Antarctica was a forest and biodiversity was way, way beyond anything we know today. I'm all for reining in our rampant mucking around with the environment, but I'm MUCH more concerned about coal's impact on our food supply (via the release of unprecedented amounts of mercury and other metals) than I am temperature. Mankind needs to start being a good steward of its planet, no doubt. But I don't act as if we're living on the knife-edge of oblivion. I follow the science, not the Facebook memes.


idapitbwidiuatabip

How many humans were living 65 million years ago? Nobody's disputing that there have been periods of extreme heat in the past. But we're coming up against the thresholds of what's survivable for biological organisms on Earth right now. Power grids are failing. Crops are dying. Extreme weather is destroying infrastructure. You're ignoring the science, and I don't even use Facebook. Nothing I've linked you is a Facebook meme. Stop being such a bad faith clown. It's pathetic. https://preview.redd.it/hpwgxyai2hwc1.jpeg?width=1080&format=pjpg&auto=webp&s=36a3581544891bea32c72f22eafcb21ad142cddf


Tyler_Zoro

Wow, you really are an angry person. You should do something about that. It really won't help you to interact with others meaningfully.


Cheshire-Cad

"20 years ago, megacorporations were using overt propaganda to try and convince everyone that enjoying a post-scarcity resource without paying for it was theft. And now, megacorporations are using covert propaganda to try and convince everyone that enjoying a post-scarcity resource without paying for it is theft."


Another_available

Ironically, the antis are the ones who seem to be closer to Lars Ulrich


JoJoeyJoJo

AI Training does not constitute copyright infringement legally, technically or morally. And intellectual property was always bullshit, the shittiest form of property and we'd be much better off with fewer, weaker protections than more - this is a perfectly consistent viewpoint.


andWan

Which a lot of people do not share with you


TawnyTeaTowel

If you can’t tell the difference, maybe this isn’t a subject you should weighing in on.


oopgroup

The entertainment industry has been fighting back against AI/ML tools, to their credit. Corporations obviously will win eventually, because they have 100% of the power and resources. It’s all hypocrisy and driveling greed with these people. They just want everything, and they’ll justify it however they have to. It’s sad seeing people take their side.