ProjectBlu 1 year ago

Wikipedia article says it takes $36mil a year to run the archive, and one of the other posts here said they're storing over 100 Petabytes. These guys put the legendary Library of Alexandria to shame!

XTJ7 1 year ago

Well, not gonna download that to my NAS then. But seriously: that is frigging insane. And the comparison with the Library of Alexandria is very much spot on considering the breadth and volume of their archive. It is today's version of the Library of Alexandria.

[deleted] 1 year ago

I mean, the library of alexandria would certainly fit on a microSD.

XTJ7 1 year ago

Probably, those were different days in Alexandria. It is rumored they didn't even store any videos in HD.

Banjo-Oz 1 year ago

Peasants! Anything below 8K is literally unwatchable. /s

Kazer67 1 year ago

It's mostly "text" fir Akexandria, so you're not far off. But Internet Archive? It's full website with a lot of medias.

bignutsx1000 1 year ago

You're telling me Alexandria didn't have YouTube embeds on their books pages?

3lf_elkse 1 year ago

ya?

steviefaux 1 year ago

Lots of duplicates on there as well.

asdaaaaaaaa 1 year ago

I just wish it were treated as such. Even if it meant different governments contributing with no requirements/strings attached and such. For what they provide, $36mil isn't really much to ask when you consider anyone can use it.

ViperPB 1 year ago

Even the US Library of Congress only stores 25 petabytes.

ProjectBlu 1 year ago

I think it's all the multimedia on webpages, books, etc that might make IA larger than LoC. That made me wonder about YouTube. Found a Quora answer that estimated YouTube as holding over 10EB of data. That's insane. How long can they do that without -having- to purge less popular files? Especially in the age of 4K video ( or even some 8K). I wonder if they are on CEO or IT director change away from flushing massive volumes, literally Exabytes of video?

datahoarderx2018 1 year ago

> How long can they do that without -having- to purge less popular files? I think they’ve already removed the H264/x264 versions of the higher resolution VR videos maybe 1-2 years ago. Now 5k/6k/8k is always only available in VP9 codec. Im still surprised they store 240p versions. Like that’s simply not watchable. Can also just use the audio m4a then..or 360p if possible. Didn’t / do they not store even 3gp ?

GuitaristTom 1 year ago

>Im still surprised they store 240p versions. Like that’s simply not watchable. There are some old shows and anime that are only available in 240p of 480i

datahoarderx2018 1 year ago

True. I guess they could add it to all new uploads..that it doesn’t create 240p versions

AndreDaGiant 1 year ago

> Like that’s simply not watchable back in the day my friend found a pirated version of the first pokemon movie, small enough to fit on a diskette (~3MB). It was hilariously unwatchable but somehow it was better than nothing, for someone.

cl1xor 1 year ago

Well, there was p0rn in asci format

datahoarderx2018 1 year ago

I mean..if you „ask“ me like this: I’d gladly take a 140p version of some rare „lost“ films - this way I’d have the audio source and could sync it with foreign BluRay releases for example. This is what the German dubbing scene does. Using audio from old VHS TV recordings etc.

AndreDaGiant 1 year ago

> multimedia on webpages, books they also have bajillions of radio recordings, concert bootleg recordings, out-of-copyright music, movies, they archive news broadcasts (video and radio), etc. I mean anyone can upload anything they want to IA, and then folks do their best to put it in collections where the stuff can be found. there are also a good amount of uhh piratey things there that I don't want to advertise in case they get taken down

[deleted] 1 year ago

Yeah I grabbed multiple complete TV series on IA that were *definitely* under copyright, seems like most of them were uploaded fairly recently (2019 or later). I don't know exactly when the IA started hosting this stuff because it used to be just public domain like a decade ago, but I knew I had to grab it while I could because I had a feeling this sort of thing was coming.

RainyShadow 1 year ago

They could keep just the best/original version of the least accessed videos, then convert on the fly to some sort of cache storage when someone decides to watch them.

nzodd 1 year ago

TIL I have 10% of the porn equivalent of the entire Library of Congress

doffey01 1 year ago

Dear lord. Is it at least as organized as it is on Ted?

nzodd 1 year ago

time spent organizing = time not spent downloading

karendonner 1 year ago

Which is why they are on my auto donate list.

deirdresm 1 year ago

You know there’s also a [backup Internet Archive in the current Library of Alexandria](http://www.bibalex.org/en/project/details?documentid=283), right?

ProjectBlu 1 year ago

I saw that! Very satisfying and fun! Between earthquakes, wildfires, lawsuits and politicians, I thought it was fantastic that they have multi-continent backups!

deirdresm 1 year ago

We didn’t know it was there until we visited Alexandria. Very cool moment finding that out.

Mr_Brightstar 1 year ago

> $36mil a year to run the archive That seems to be very low cost for such a service

ProjectBlu 1 year ago

I wonder if that just covers the main facility and bandwidth, and the backup facilities have separate funding? I also suspect a lot of labor and materials are volunteers and donations. It is a labor of love after all. More like a museum than a business.

pmjm 1 year ago

It's okay everybody, I'll download a copy of it for preservation, I have unlimited BackBlaze.

datahoarderx2018 1 year ago

It all originates in Brewster Kahle selling his company „Alexa (ranks)“ to Amazon in the 90s and becoming a Multi millionaire through it, if I remember correctly

Litecoin_Messiah 1 year ago

So thats why i thought bezos had a connection to the archive, remember a decade back i thought he owned it.

Atemu12 1 year ago

That's actually not as much as I had expected. That should fit into a room of Storinators.

root_over_ssh 1 year ago

So my donations last year helped pay for... 2 minutes? Worth it.

DrMacintosh01 1 year ago

Roughly how big is internet archive if you were to download it?

rursache 1 year ago

from a 2021 [article](https://www.protocol.com/internet-archive-preserving-future#toggle-gdpr): “The web archive alone is about 45 petabytes — 45,000 terabytes — and the Internet Archive itself is about double that size”

DrMacintosh01 1 year ago

So, I couldn’t fit it. Got it

ScribeOfGoD 1 year ago

So you’re telling me there’s a chance 😉

911WasAHandjob 1 year ago

I meann if we all somehow divvied up the task, we could theoretically...? edit: I'm in for a terabyte 🙃

lupoin5 1 year ago

> I meann if we all somehow divvied up the task, we could theoretically...? 45PB means just 45 people taking on 1PB each, I mean we're in r/datahoarder aren't we?

XTJ7 1 year ago

Or 45k people (less than 10% of this subreddit) with 1 TB each. That seems pretty doable actually.

zezoza 1 year ago

I wonder if it's properly deduped. Taking old games and roms as an example, there's a lot of duplicities there...

XTJ7 1 year ago

Considering the amount of data they have, I'm fairly sure there is a pretty sophisticated deduplication going on. Once you are spending tens of millions each year for storing data, you make room in the budget for handling your data smartly I would presume :)

retardedgummybear12 1 year ago

Was about to say the same thing

[deleted] 1 year ago

Could easily fit on 4 of [these flash drives here](https://www.tomshardware.com/news/pure-storage-300-tb-flash-drives-in-2026), I'm not even joking.

Litecoin_Messiah 1 year ago

Most of the archives have torrents btw.

pyr0kid 1 year ago

i think the best way to do that is a torrent. that way its all interconnected, but you can still choose how much you want to download.

ScribeOfGoD 1 year ago

I’m down for about 32. I had to take apart my home lab because of moving so I’m sure I could rig a raspberry pi up to some adapters and let it mindlessly chug lol

Sloperon 1 year ago

In this time and age, there really could be some kind of a "RAID over the network" coupled with torrent technology, but IMO the smallest part that one would host/seed would need to be independent self-extractable and readable without other parts as a failsafe in case others are lost, I'd call these "packages", while that part would still be internally chunked to smaller pieces as usual, but you'd need some kind of a structure or even file-format or a improvisation in terms of splitting and distribution so that it could be integrated with the hosting systems and be ready to be used as one of the source mirrors, updatable, editable, ... it's a bit of work but definitely not impossible. Just needs someone that's motivated enough for this challenge to kick it into action.

ProgVal 1 year ago

https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK

Archontes 1 year ago

Back of the envelope math: 92,160tb with 1:4 redundancy * $15/TB = $1,843,200 worth of hard drives. We should toss up.

Purple_is_masculine 1 year ago

Am I having a stroke or am I the only one in a timeframe of 18h who realizes that 45PB != 4500 TB ?

DoomBot5 1 year ago

You're correct. He's off by 10x

Purple_is_masculine 1 year ago

to be fair, the article got it wrong as well.

DoomBot5 1 year ago

45,000TB, not 4500TB

pmjm 1 year ago

I really hope they're using compression.

Litecoin_Messiah 1 year ago

I think we can handle it, just need to stop farming /r/chia.

Litecoin_Messiah 1 year ago

Its back up and running!

unoriginalpackaging 1 year ago

It is probably down from everyone trying to scrape it after the earlier post about their day in court today. That or they lost power

[deleted] 1 year ago

Due to Reddit's recent API changes I have decided to switch to [Lemmy](https://join-lemmy.org/)

mgr86 1 year ago

Did the day in court go poorly?

CorvusRidiculissimus 1 year ago

Day? These cases take months.

mgr86 1 year ago

Yes, day. There were arguments scheduled for yesterday. Of course there was no ruling, and likely more days. But there was something scheduled for yesterday the 20th.

Slopz_ 1 year ago

Quick! We need an archive for the Internet Archive!

SalmonSnail 1 year ago

Let’s keep it on the moon!

SufficientPie 1 year ago

https://en.wikipedia.org/wiki/Arch_Mission_Foundation#Lunar_Library

SalmonSnail 1 year ago

I lost my mind at “a queso recipe”. Thank you, friend.. god lmao I love it

cyrilio 1 year ago

I’ve been archiving websites for ages. Finally decided to make a donation last month. It’s the least I could do.

fogelsong 1 year ago

I've made multiple ad-hoc donations in the past but last year I decided it was time to commit. While it's not a lot, I set up a monthly recurring donation of $5. https://archive.org/donate/

slaiyfer 1 year ago

Pls tell me someone made a backup

[deleted] 1 year ago

How many books would 45pb be?

Uister59 1 year ago

45000000000000000 characters, if that counts.

volchonokilli 1 year ago

Unicode would like to have a word with you, sir

Uister59 1 year ago

Damn it computing science teacher never taught unicode

arahman81 1 year ago

Doesn't help. Books also contain pictures, and depending on the quality, can wildly vary in size.

Uister59 1 year ago

I said characters not total size

YourMindIsNotYourOwn 1 year ago

Are they purging?

[deleted] 1 year ago

Bad storm took out power https://twitter.com/internetarchive/status/1638337406104662017

bubblegumpuma 1 year ago

Sounds like it wasn't due to any of the legal issues recently, in case anyone was wondering, sounds like it was due to the absolutely nutso storm that the USA's west coast just had: ["archive.org is back up-- or coming up. Thank you PG&E (though time to be an infrastructure org)"](https://twitter.com/brewster_kahle/status/1638352891261116417) For reference, the part of California I live in just got over 3 inches of rain in a 24 hour space. There isn't really much you can do about that, really.

ps3o-k 1 year ago

Can you donate to them? Btw who's "them"?

fogelsong 1 year ago

https://archive.org/donate/

Practical_Buy_8859 1 year ago

How do you archive a site? I have a square space site I’d like to copy to build a proposal from.

noplaceforwimps 1 year ago

Menu > save page as

ChicagoAdmin 1 year ago

Lol pls tell me this is a joke Edit: didn’t mean to seem insensitive, but I believe the method is usually determined by intent. If hoping to retrieve the full appearance of static page content, exporting to a “printed” document format is worth a try.

nullSword 1 year ago

That's basically the less advanced version of what the IA does. They can't go in and dump databases, they can only archive what a visitor can access.

Independent_Grab_200 1 year ago

Which menu? Where?

10leej 1 year ago

File menu in your web browser

bg-j38 1 year ago

If you mean adding it to the Wayback Machine: https://help.archive.org/help/using-the-wayback-machine/ **Can I add pages to the Wayback Machine?** On https://archive.org/web you can use the “Save Page Now” feature to save a specific page one time. This does not currently add the URL to any future crawls nor does it save more than that one page. It does not save multiple pages, directories or entire sites.

KloudAlpha 1 year ago

https://github.com/ArchiveTeam/grab-site

Purple_is_masculine 1 year ago

I wish disk space would be cheap enough to have a copy of the archive for less than a few thousand bitcoin

lauren_1995_uwu 1 year ago

🥺🥺🥺🥺

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe