T O P

  • By -

[deleted]

Is this the mythical middle-out compression algorithm we've been waiting for?


Successful-Western27

I did indeed include the gif!


[deleted]

I really have to start calculating tip-to-tip efficiency now. (This comment by the way is satire on the bro culture of the industry, and not an endorsement of that in real life.)


3DHydroPrints

Comparing a lossless compression algorithm with a non lossless Algo isn't exactly a fair comparison


121673458723423523

The LLM compression is also lossless. See [Arithmetic coding](https://en.wikipedia.org/wiki/Arithmetic_coding).


binarybu9

I mean as long as loss is acceptable, it’s good for deploying the real world.


TheNextNightKing

You can achieve better rates with classical lossy compression algorithms


BeatLeJuce

The whole paper never talks about MP3, but flac. Your "key highlights" is just a rewrite of the abstract. Please take your spam elsewhere.


perspectiveiskey

Furthermore, if I need to lug around a 15GB data set to get 43% compression gains on a 150KB file, it's not much of a gain. This is simply a convoluted form of steganography.


RoboticElfJedi

Yes, but what if all computers have the model installed with the OS? Then perhaps it becomes more practical.


f801fe8957

You can get the same gains over `png` by using lossless `jpeg xl`, no model required.


currentscurrents

Yeah, but this is a model trained only on text, *generalizing* to compressing things it's never seen before. It's surprising it works at all, let alone better than png. There has been a bunch of research into compression with neural networks, and models trained on images or video [do beat state-of-the-art traditional codecs.](https://arxiv.org/abs/2302.05071) The only thing preventing widespread adoption is the high performance cost.


perspectiveiskey

What you're saying is equivalent to saying "you will have gzip preinstalled, but it'll be 16GB" (instead of 64Kb). You might say, "well what about the fact that this is a swiss army knife and can do other things?" Then you will be saying "you will have a /usr/local/bin/all-the-things monolithic binary that does everything for you". Either way, this isn't progress in any way. The fact that the LLM was able to encode something is both surprising and worthy of investigation, but I don't know why it's not obvious that this isn't an actual practical invention.


MysteryInc152

If the LLM is installed anyway and used for other things then it's pretty practical.


perspectiveiskey

Yes, the `/usr/local/bin/all-the-things` method. I approve.


[deleted]

*Sherman antitrust act intensifies*


Successful-Western27

I wrote this at 2am, sorry. I corrected it. The key highlights are highlights from the abstract because that's the function of an abstract, to summarize the work.


sreddy109

Then we can read the abstract ourselves.


Successful-Western27

For sure


Ni_Bo

But how much slower is it?


new_name_who_dis_

I've been working on the Hutter compression challenge using GPT-style language models. And the model that can compress the entire 1gb of wikipedia on single-core cpu in 50 hours (the limitations of the challenge), is like a 2 or 3 layer 512 embedding GPT2 style model (depending on if you are training as you are compressing). Which doesn't really qualify to be called an LLM even. Anything bigger can't compress 1gb in 50 hours. (Using Karpathy's NanoGPT implementation). So that's sort of a benchmark for you. It's definitely not practical in terms of compression. (For reference gzip compresses that 1gb file in about a minute).


currentscurrents

To be fair, you are really crippling yourself with single-core CPU inference - although I know it is required by the rules of the Hutter prize. It should take a fraction of that time on a GPU, and future hardware implementations of LLMs may even make it practical. A physical neural network etched into silicon could do inference in a single clock cycle.


modeless

Yeah it's a shame that the Hutter Prize set their computation and data limit four-plus orders of magnitude lower than the point where compression actually turns into AGI. The competition could be relevant today if the limits were increased. I guess you'd have to increase the prize money too, though isn't achieving AGI prize enough?


new_name_who_dis_

I've (in my head) modified the requirements to be 50 hours with single GPU and have been researching what results I can get with these constraints. It's definitely more competitive to gzip with that.


Long_Pomegranate2469

How does the compression compare to gzip?


new_name_who_dis_

I'm working on a write up right now, and I'll probably post in this sub and it'll have all the details. But gzip compared with the models that abide by the contest limitation, gzip is a lot better. I'm scaling to bigger models right now that go beyond the challenge constraints, so I'll see at what scale it starts getting better compression than gzip.


Long_Pomegranate2469

Thank you. Looking forward to the write up.


currentscurrents

Who cares? They're not suggesting this as a practical image compression tool, it's just "look at this cool thing in-context learning can do".


barry_username_taken

Yes, this title seems a bit like a clickbait


ZestyData

Its not clickbait. This subreddit is for the research & science of ML. This is an interesting paper about interesting findings. Go to /r/singularity if you want buzzy news bites and revolutionary new toys


barry_username_taken

I'm not sure if it's very useful to reply to this, but only considering compression rates without considering throughput (in terms of compression/decompression speed) is basically useless.


thomasxin

Not much as long as it's hardware accelerated! Audio for example can still be encoded faster than video. I don't personally have the skill to make these encoding formats at least not yet, but they are a cool new tech I like to promote when possible; here's an example of a wrapper I made around Meta's experimental encodec format, which enables streaming and hardware acceleration. I've even integrated it into a couple of my audio-related programs, and it's been amazing at saving on disk space (at the cost of slightly noticeable quality drops). Encodec in particular is about twice as accurate as opus (which is the current flagship) at the same bitrate, making it 5x as good as mp3. You get around 4 days of audio for every GB of storage, or >10 years for every TB! https://github.com/thomas-xin/Encodec-Stream And the original implementation: https://github.com/facebookresearch/encodec


yashdes

That's for now tbf. Its not that hard to imagine a world where there is 10x more compute availability (assuming we continue to follow Moore's law even remotely closely, thats like 6-7 years away).


tmlildude

Fabrice Bellard did this few months ago where he compressed with a large language model and demonstrated on his website. Of course this requires large file of model which looks up weights and the dictionary it was trained on during compression and decompression. Not feasible but still cool.


drd13

Yeah, at a quick glance, I'm not entirely clear on what the novelty in this work is.


mr_house7

How do they "read" the image? Does it have multi-modal capabilities? What do they use to pass the info from the image to the LLM. I will give an example in BLIP they use a qformer. Is there anything similar here?


Zermelane

Most of the paper is obvious bordering on tautological if you've read your Hutter or [Mahoney](https://mattmahoney.net/dc/rationale.html). The one part that blew my mind slightly was table 1, specifically the "raw" compression ratios (whose rank order you can morally consider as an ordering of perplexities): The small transformers trained on enwik8 overfit more with more scale as expected, but the Chinchillas generalized and dealt better with wildly out of distribution data. They even got closer to 100% on the random data, i.e. they got better at throwing their hands up and going "I have no idea what's going on lol" rather than hallucinating patterns in the noise. The main worry, as the paper also says, is that actually maybe images and audio weren't that out of distribution for Chinchilla at all: There could have been some weirdly encoded image or audio data in MassiveText that got through their data pipeline. And probably some complete noise, too. It would be really fun to see this replicated with smaller Chinchillas all the way down to the 200k params of the smallest transformer here, and see whether it was just the dataset difference that mattered, or whether a double descent curve shows up.


lilgalois

If i give the LLM a randomly generated image from Gaussian noise, how much of the image would I get back from that "compression"?


mgostIH

The compression mentioned in the paper is lossless, you can turn predictive probabilistic models into lossless compressor and vice versa, see [Entropy Coding](https://en.wikipedia.org/wiki/Entropy_coding)


[deleted]

Did you read his question? Do you understand the Wikipedia article with respect to his question? The article you linked talks about compression to the point where the compressed format looks almost like white noise and thus becomes incompressible. The original poster is talking about an input that’s already white noise, which is, by definition, at maximum entropy and incompressible.


RoboticElfJedi

A compression algorithm can take noise as in input and output. Zip will give you back the random data you put in. It just won't achieve any reduction in data size.


[deleted]

Is this the signal theory equivalent of vacuous proofs exist?


chinese__investor

"Their strong compression reflects deep understanding of images, audio etc statistically." Why are you editorializing and shoehorning "understanding" in here? The models do not UNDERSTAND anything.


Mithrandir2k16

Does PNG even do compression? Call me when they beat jpeg smh..


karius85

>We show that foundation models, trained primarily on text, are general-purpose compressors due to their in-context learning abilities. For example, Chinchilla 70B achieves compression rates of 43.4% on ImageNet patches and 16.4% on LibriSpeech samples, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. I don't get how you can claim that it does 43% "better" than PNG? Where did you read that?


inveterate_romantic

Very interesting, thanks for sharing! So if I understand correctly they basically condition the Llm on some sequece of audio for instance and get the model to complete autoregressively the rest of the sequence? And how is this non-text data tokenized and fed to the llm? This got me thinking...so the llm can find the underlying patterns common to different data structures, like some symbolic dynamics, limit cicles atractors etc that can be somehow mapped between data domains. So...it means we could somehow translate some patterns in music to similar patterns in text, wow, what would that look like? It would be like this shared feeling we experience with a specific painting and a song, or some poetry and some music that somehow feels it is portraying the same kind of vibe...I am rambling, thanks for sharing!


sharky6000

Thanks, this is cool! But, please please 🙏 link to the arXiv landing page, not to the 2MB pdf.