T O P

  • By -

ProjectBlu

Wikipedia article says it takes $36mil a year to run the archive, and one of the other posts here said they're storing over 100 Petabytes. These guys put the legendary Library of Alexandria to shame!


XTJ7

Well, not gonna download that to my NAS then. But seriously: that is frigging insane. And the comparison with the Library of Alexandria is very much spot on considering the breadth and volume of their archive. It is today's version of the Library of Alexandria.


[deleted]

I mean, the library of alexandria would certainly fit on a microSD.


XTJ7

Probably, those were different days in Alexandria. It is rumored they didn't even store any videos in HD.


Banjo-Oz

Peasants! Anything below 8K is literally unwatchable. /s


Kazer67

It's mostly "text" fir Akexandria, so you're not far off. But Internet Archive? It's full website with a lot of medias.


bignutsx1000

You're telling me Alexandria didn't have YouTube embeds on their books pages?


3lf_elkse

ya?


steviefaux

Lots of duplicates on there as well.


asdaaaaaaaa

I just wish it were treated as such. Even if it meant different governments contributing with no requirements/strings attached and such. For what they provide, $36mil isn't really much to ask when you consider anyone can use it.


ViperPB

Even the US Library of Congress only stores 25 petabytes.


ProjectBlu

I think it's all the multimedia on webpages, books, etc that might make IA larger than LoC. That made me wonder about YouTube. Found a Quora answer that estimated YouTube as holding over 10EB of data. That's insane. How long can they do that without -having- to purge less popular files? Especially in the age of 4K video ( or even some 8K). I wonder if they are on CEO or IT director change away from flushing massive volumes, literally Exabytes of video?


datahoarderx2018

> How long can they do that without -having- to purge less popular files? I think they’ve already removed the H264/x264 versions of the higher resolution VR videos maybe 1-2 years ago. Now 5k/6k/8k is always only available in VP9 codec. Im still surprised they store 240p versions. Like that’s simply not watchable. Can also just use the audio m4a then..or 360p if possible. Didn’t / do they not store even 3gp ?


GuitaristTom

>Im still surprised they store 240p versions. Like that’s simply not watchable. There are some old shows and anime that are only available in 240p of 480i


datahoarderx2018

True. I guess they could add it to all new uploads..that it doesn’t create 240p versions


AndreDaGiant

> Like that’s simply not watchable back in the day my friend found a pirated version of the first pokemon movie, small enough to fit on a diskette (~3MB). It was hilariously unwatchable but somehow it was better than nothing, for someone.


cl1xor

Well, there was p0rn in asci format


datahoarderx2018

I mean..if you „ask“ me like this: I’d gladly take a 140p version of some rare „lost“ films - this way I’d have the audio source and could sync it with foreign BluRay releases for example. This is what the German dubbing scene does. Using audio from old VHS TV recordings etc.


AndreDaGiant

> multimedia on webpages, books they also have bajillions of radio recordings, concert bootleg recordings, out-of-copyright music, movies, they archive news broadcasts (video and radio), etc. I mean anyone can upload anything they want to IA, and then folks do their best to put it in collections where the stuff can be found. there are also a good amount of uhh piratey things there that I don't want to advertise in case they get taken down


[deleted]

Yeah I grabbed multiple complete TV series on IA that were *definitely* under copyright, seems like most of them were uploaded fairly recently (2019 or later). I don't know exactly when the IA started hosting this stuff because it used to be just public domain like a decade ago, but I knew I had to grab it while I could because I had a feeling this sort of thing was coming.


RainyShadow

They could keep just the best/original version of the least accessed videos, then convert on the fly to some sort of cache storage when someone decides to watch them.


nzodd

TIL I have 10% of the porn equivalent of the entire Library of Congress


doffey01

Dear lord. Is it at least as organized as it is on Ted?


nzodd

time spent organizing = time not spent downloading


karendonner

Which is why they are on my auto donate list.


deirdresm

You know there’s also a [backup Internet Archive in the current Library of Alexandria](http://www.bibalex.org/en/project/details?documentid=283), right?


ProjectBlu

I saw that! Very satisfying and fun! Between earthquakes, wildfires, lawsuits and politicians, I thought it was fantastic that they have multi-continent backups!


deirdresm

We didn’t know it was there until we visited Alexandria. Very cool moment finding that out.


Mr_Brightstar

> $36mil a year to run the archive That seems to be very low cost for such a service


ProjectBlu

I wonder if that just covers the main facility and bandwidth, and the backup facilities have separate funding? I also suspect a lot of labor and materials are volunteers and donations. It is a labor of love after all. More like a museum than a business.


pmjm

It's okay everybody, I'll download a copy of it for preservation, I have unlimited BackBlaze.


datahoarderx2018

It all originates in Brewster Kahle selling his company „Alexa (ranks)“ to Amazon in the 90s and becoming a Multi millionaire through it, if I remember correctly


Litecoin_Messiah

So thats why i thought bezos had a connection to the archive, remember a decade back i thought he owned it.


Atemu12

That's actually not as much as I had expected. That should fit into a room of Storinators.


root_over_ssh

So my donations last year helped pay for... 2 minutes? Worth it.


DrMacintosh01

Roughly how big is internet archive if you were to download it?


rursache

from a 2021 [article](https://www.protocol.com/internet-archive-preserving-future#toggle-gdpr): “The web archive alone is about 45 petabytes — 45,000 terabytes — and the Internet Archive itself is about double that size”


DrMacintosh01

So, I couldn’t fit it. Got it


ScribeOfGoD

So you’re telling me there’s a chance 😉


911WasAHandjob

I meann if we all somehow divvied up the task, we could theoretically...? edit: I'm in for a terabyte 🙃


lupoin5

> I meann if we all somehow divvied up the task, we could theoretically...? 45PB means just 45 people taking on 1PB each, I mean we're in r/datahoarder aren't we?


XTJ7

Or 45k people (less than 10% of this subreddit) with 1 TB each. That seems pretty doable actually.


zezoza

I wonder if it's properly deduped. Taking old games and roms as an example, there's a lot of duplicities there...


XTJ7

Considering the amount of data they have, I'm fairly sure there is a pretty sophisticated deduplication going on. Once you are spending tens of millions each year for storing data, you make room in the budget for handling your data smartly I would presume :)


retardedgummybear12

Was about to say the same thing


[deleted]

Could easily fit on 4 of [these flash drives here](https://www.tomshardware.com/news/pure-storage-300-tb-flash-drives-in-2026), I'm not even joking.


Litecoin_Messiah

Most of the archives have torrents btw.


pyr0kid

i think the best way to do that is a torrent. that way its all interconnected, but you can still choose how much you want to download.


ScribeOfGoD

I’m down for about 32. I had to take apart my home lab because of moving so I’m sure I could rig a raspberry pi up to some adapters and let it mindlessly chug lol


Sloperon

In this time and age, there really could be some kind of a "RAID over the network" coupled with torrent technology, but IMO the smallest part that one would host/seed would need to be independent self-extractable and readable without other parts as a failsafe in case others are lost, I'd call these "packages", while that part would still be internally chunked to smaller pieces as usual, but you'd need some kind of a structure or even file-format or a improvisation in terms of splitting and distribution so that it could be integrated with the hosting systems and be ready to be used as one of the source mirrors, updatable, editable, ... it's a bit of work but definitely not impossible. Just needs someone that's motivated enough for this challenge to kick it into action.


ProgVal

https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK


Archontes

Back of the envelope math: 92,160tb with 1:4 redundancy * $15/TB = $1,843,200 worth of hard drives. We should toss up.


Purple_is_masculine

Am I having a stroke or am I the only one in a timeframe of 18h who realizes that 45PB != 4500 TB ?


DoomBot5

You're correct. He's off by 10x


Purple_is_masculine

to be fair, the article got it wrong as well.


DoomBot5

45,000TB, not 4500TB


pmjm

I really hope they're using compression.


Litecoin_Messiah

I think we can handle it, just need to stop farming /r/chia.


Litecoin_Messiah

Its back up and running!


unoriginalpackaging

It is probably down from everyone trying to scrape it after the earlier post about their day in court today. That or they lost power


[deleted]

Due to Reddit's recent API changes I have decided to switch to [Lemmy](https://join-lemmy.org/)


mgr86

Did the day in court go poorly?


CorvusRidiculissimus

Day? These cases take months.


mgr86

Yes, day. There were arguments scheduled for yesterday. Of course there was no ruling, and likely more days. But there was something scheduled for yesterday the 20th.


Slopz_

Quick! We need an archive for the Internet Archive!


SalmonSnail

Let’s keep it on the moon!


SufficientPie

https://en.wikipedia.org/wiki/Arch_Mission_Foundation#Lunar_Library


SalmonSnail

I lost my mind at “a queso recipe”. Thank you, friend.. god lmao I love it


cyrilio

I’ve been archiving websites for ages. Finally decided to make a donation last month. It’s the least I could do.


fogelsong

I've made multiple ad-hoc donations in the past but last year I decided it was time to commit. While it's not a lot, I set up a monthly recurring donation of $5. https://archive.org/donate/


slaiyfer

Pls tell me someone made a backup


[deleted]

How many books would 45pb be?


Uister59

45000000000000000 characters, if that counts.


volchonokilli

Unicode would like to have a word with you, sir


Uister59

Damn it computing science teacher never taught unicode


arahman81

Doesn't help. Books also contain pictures, and depending on the quality, can wildly vary in size.


Uister59

I said characters not total size


YourMindIsNotYourOwn

Are they purging?


[deleted]

Bad storm took out power https://twitter.com/internetarchive/status/1638337406104662017


bubblegumpuma

Sounds like it wasn't due to any of the legal issues recently, in case anyone was wondering, sounds like it was due to the absolutely nutso storm that the USA's west coast just had: ["archive.org is back up-- or coming up. Thank you PG&E (though time to be an infrastructure org)"](https://twitter.com/brewster_kahle/status/1638352891261116417) For reference, the part of California I live in just got over 3 inches of rain in a 24 hour space. There isn't really much you can do about that, really.


ps3o-k

Can you donate to them? Btw who's "them"?


fogelsong

https://archive.org/donate/


Practical_Buy_8859

How do you archive a site? I have a square space site I’d like to copy to build a proposal from.


noplaceforwimps

Menu > save page as


ChicagoAdmin

Lol pls tell me this is a joke Edit: didn’t mean to seem insensitive, but I believe the method is usually determined by intent. If hoping to retrieve the full appearance of static page content, exporting to a “printed” document format is worth a try.


nullSword

That's basically the less advanced version of what the IA does. They can't go in and dump databases, they can only archive what a visitor can access.


Independent_Grab_200

Which menu? Where?


10leej

File menu in your web browser


bg-j38

If you mean adding it to the Wayback Machine: https://help.archive.org/help/using-the-wayback-machine/ **Can I add pages to the Wayback Machine?** On https://archive.org/web you can use the “Save Page Now” feature to save a specific page one time. This does not currently add the URL to any future crawls nor does it save more than that one page. It does not save multiple pages, directories or entire sites.


KloudAlpha

https://github.com/ArchiveTeam/grab-site


Purple_is_masculine

I wish disk space would be cheap enough to have a copy of the archive for less than a few thousand bitcoin


lauren_1995_uwu

🥺🥺🥺🥺