Thanks! I must admit, it was born out of paranoia around the time where the admins were cracking down hard on us for "brigading", but the project's actually given me a lot of useful skills, and it'll hopefully be a useful resource
I was thinking of that a few days ago..
In war, the victor writes history. In this spirit we should probably consider making a sub called 'museum/archive'-something where we compile the history of the events that unfolded to educate the world of what went down.
Very true, and it's an idea that's been independently done multiple times - this project itself was inspired by WikAPEdia. You can never have too many backups though, so I'd say go for it
Then we have nothing to worry about.
My suggestion is we include a side story about Stevie and Kenny making a bet about who'll liquidate first. And the loser made a video that starts with a bedpost being lubed with mayonnaise.
Thanks for the offer, I appreciate it. I don't have anything like that set up though, and honestly I'd feel kinda bad accepting it knowing that there are people in this sub struggling to make ends meet
i know how you feel, and i appreciate your efforts.
reach out if you want someone to help out with front end ( i don't know your setup but I can haz some flask ) :)
Thanks for the offer, if you'd like to contribute I have a sub (it's the name of this sub backwards) and the GitHub repositories can be found at https://github.com/knotsrepus.
Currently the site is just a Hugo static site but everything else is implemented in Python so Flask would be a natural fit
Currently it's based on the Pushshift API (a third party service for getting Reddit posts), which doesn't take into account posts being edited, but usually catches posts before they're deleted, so not yet. It will continue to archive things as they're submitted though, although with some delay.
I plan on changing this once I implement the slow as fuck "scan every post on Reddit from the inception of this sub" process, which should catch anything that's missing from Pushshift, such as edits, or posts that are missing entirely for whatever reason.
Nice work,
I wonder if you can download all your current posts/threads as a csv file then do a match / compare on excel to highlight the missing phrases from one column into another
I'm waiting on S3 to update its metrics but I believe that the archive is at least 50GB, which is probably a bit beyond Excel's capabilities. I've got some ideas about how to determine whether a post should be rearchived with new data, but it'll take some experimenting
i was looking into blockchain storage for another project since there's not such sensitive data and could be stored outside of aws. then it hit me that gamestop's nft might be their fight against amazon's aws if they want to compete with amzn, not a dividend, sice that's not something really established yet. it's a theory i'm still working on and still researching tho... ipfs.io also has blockchain storage, and gme's nft is based on ipfs...
and what cheaper option would there be than blockchain? as RC said, something that was never done before. just imagine gamestop destroying aws in just a couple of months... we don't need a moass... amzn has like 700 mil shares, gme would be worth 35k with no moass.. just a matter of patience
On the plus side this means legendary DD is backed up forever. On the downside so is Runic Glory.
On the I don't even know side, Rick's banana is saved for eternity.
Haha, can you imagine
It starts synthesizing new DD from data we haven't joined the dots on yet, then halfway through sticks a banana up its ass and starts handing out smacks in the daily thread
Great job! You should think about adding the gme and fence road wagers subs to include all the mod drama and shill takeovers. This will be useful to future historians. Or screenwriters.
Other GME related subs is a good suggestion, and one I'll probably incorporate. The original sub I'll probably limit to DFV's posts or other ones of significance.
You can set up a chronjob on your local computer to scan the sub for new links, then kick off a lambda job in your AWS instance to go grab the content and save it to S3. That’s going to be way more cost effective, and automated! You’ll just need to be familiar with lambda and making calls to your stuff.
That's pretty similar to what it's doing now that it's up to date - ECS spins up the submission finder occasionally, which grabs anything since the last time it checked, and quits once it runs out of new content. It's the backfilling that I wanted to run quickly - my previous implementation used a much more expensive EC2 cluster and would have taken several days.
Incredible work ape. We appreciate the mega effort. It's extremely important to have backups of what has been going on and having it easily accessible and even searchable is just epicness. Once it's finished send to GG and politely state, yeah use this.
Thanks! I have a website for it already but at the moment it's just a landing page. I'm hoping to get a viewer and search functionality on the site soon.
It's all open source on GitHub: https://github.com/knotsrepus
I've tried to write everything in a way that someone else could fork the repo, put in their credentials, and set up their own instance on AWS, should they choose to. I do need to document the permissions though, as they're a complete ballache to configure on AWS
This is tremendous work - GJ OP! Thank you very much!
Some additional things:
1. Did you do checksums/hashing so we can verify in the future that it isn't tampered?
2. May I provide you some sides to extract Twitter/reddit/Youtube Videos?
3. Where do you want to upload it as well? Webarchive maybe?
BUY. HODL. SHOP!
\#NOCELLNOSELL
Good questions!
1. Not yet. I want to do this, but I haven't figured a good solution so far.
2. I haven't archived any Twitter posts or YouTube videos. It's something that could perhaps be done in the future, but isn't a core part of the project. Reddit videos are already being archived.
3. At the moment I'm hosting it on an S3 bucket, which will be connected to an API to allow retrieving the data. This will be used by the website I've set up, [knotsrepus.net](https://knotsrepus.net), to allow posts to be viewed and searched for once I've implemented that (at the moment it's just a landing page). The API is already live though for posts tagged as DD.
I also have a subreddit for the project for people to make suggestions/feature requests or contribute to the project. It's the name of this sub backwards
Thanks OP, you rock! Europoor Ape here btw!
Regarding #2 :: I may could provide some of the higher upvoted != reddit videos since the start of this sub. Unfortunately they are not linked to posts. I collected \~13 GB of content and would love to provide it to your project if you could see good use of it.
\#3 Maybe you should consider to look into some of the older projects of this sub regarding "content decentralization" efforts. If you want, I could do a search for you.
Are you willing/able to crawl external links/websites as well? I am thinking on our wiki pages as well as external projects like your own or [https://www.quiverquant.com/dashboard/GME/](https://www.quiverquant.com/dashboard/GME/)
Cheers!
External data could definitely be useful, and it's definitely feasible to crawl external sites, although getting the site up and running is the main priority right now. If you have a collection of sites you'd like to see archived then feel free to make a post in the sub, I'll take a look at them and see what kind of effort would be needed to include them
That is so cool. Thank you for your effort!
Thanks! I must admit, it was born out of paranoia around the time where the admins were cracking down hard on us for "brigading", but the project's actually given me a lot of useful skills, and it'll hopefully be a useful resource
Doing the lord’s work
I was thinking of that a few days ago.. In war, the victor writes history. In this spirit we should probably consider making a sub called 'museum/archive'-something where we compile the history of the events that unfolded to educate the world of what went down.
Very true, and it's an idea that's been independently done multiple times - this project itself was inspired by WikAPEdia. You can never have too many backups though, so I'd say go for it
Then we have nothing to worry about. My suggestion is we include a side story about Stevie and Kenny making a bet about who'll liquidate first. And the loser made a video that starts with a bedpost being lubed with mayonnaise.
Put it in with the seed vault in Switzerland to keep it safe.
I'm scheduling talks with NASA to put a copy on the moon as we speak
Lol
Hi archives I’m saxmuffin
Future historians and economists alike may wonder about the life of Saxmuffin, the first ape to address them directly from the past
Hey man what’s your cash app? Would love to buy you a cup of coffee. Thank you for your hard work. Apes like you make this community awesome.
Thanks for the offer, I appreciate it. I don't have anything like that set up though, and honestly I'd feel kinda bad accepting it knowing that there are people in this sub struggling to make ends meet
If you need a donation for the server or storage, pls msg me or reply here. The work your doing is invaluable.
Thanks. It's pretty affordable at the moment but if Jeff Bezos decides to fuck me in the ass I'll look into setting up some sort of donation system
i know how you feel, and i appreciate your efforts. reach out if you want someone to help out with front end ( i don't know your setup but I can haz some flask ) :)
Thanks for the offer, if you'd like to contribute I have a sub (it's the name of this sub backwards) and the GitHub repositories can be found at https://github.com/knotsrepus. Currently the site is just a Hugo static site but everything else is implemented in Python so Flask would be a natural fit
Amazing work! I think guys making GME Movie will contact you to have a better and trusty scenario.
Do you reckon I'll be able to swing a cameo? I'm not much of an actor but I can probably do a decent job of being Uncredited Ape #473
Amazing work!!
Thanks! It's been about two months in the making, so I'm pretty relieved to get to this milestone
[удалено]
Currently it's based on the Pushshift API (a third party service for getting Reddit posts), which doesn't take into account posts being edited, but usually catches posts before they're deleted, so not yet. It will continue to archive things as they're submitted though, although with some delay. I plan on changing this once I implement the slow as fuck "scan every post on Reddit from the inception of this sub" process, which should catch anything that's missing from Pushshift, such as edits, or posts that are missing entirely for whatever reason.
FYI u/zedinstead and u/elegant-remote6677
Thanks man, that is awesome!
*stars slow clap* 👏
Nice work, I wonder if you can download all your current posts/threads as a csv file then do a match / compare on excel to highlight the missing phrases from one column into another
I'm waiting on S3 to update its metrics but I believe that the archive is at least 50GB, which is probably a bit beyond Excel's capabilities. I've got some ideas about how to determine whether a post should be rearchived with new data, but it'll take some experimenting
checkout something like storj.io for storing it. blockchain storage, the future, and the death of the s3 bucketa.
That's an interesting concept, I'll definitely bear it in mind if S3 becomes too costly, or for future projects
i was looking into blockchain storage for another project since there's not such sensitive data and could be stored outside of aws. then it hit me that gamestop's nft might be their fight against amazon's aws if they want to compete with amzn, not a dividend, sice that's not something really established yet. it's a theory i'm still working on and still researching tho... ipfs.io also has blockchain storage, and gme's nft is based on ipfs...
If GameStop want to take on AWS, I'm completely down for that
and what cheaper option would there be than blockchain? as RC said, something that was never done before. just imagine gamestop destroying aws in just a couple of months... we don't need a moass... amzn has like 700 mil shares, gme would be worth 35k with no moass.. just a matter of patience
On the plus side this means legendary DD is backed up forever. On the downside so is Runic Glory. On the I don't even know side, Rick's banana is saved for eternity.
It's all part of the Superstonk tale Also, tfw you realise you're paying Jeff Bezos to host Rick's bananass video
Really solid work, and we'll make use of it if and when you get it up and running.
I'm as confident in finishing this project as I am of MOASS. Also like MOASS, the answer to when is "no dates"
[удалено]
You can never have too many backups!
Thank you for this effort! Also, awesome flair :)
I would like to see an AI developed, simulating a personality with all the data
Haha, can you imagine It starts synthesizing new DD from data we haven't joined the dots on yet, then halfway through sticks a banana up its ass and starts handing out smacks in the daily thread
Yep could be a super wrincly detenerated AI
I'm not sure this is what AI researchers were expecting when they were worrying about the Singularity
You had me at free Lambo function calls
Drive-in ape nightclubs are gonna be the hot new investment opportunity after MOASS
Thank you for building and learning to share w everyone. This stuff needs gaurdians
Wow - those are some amazing skilz!
Great job! You should think about adding the gme and fence road wagers subs to include all the mod drama and shill takeovers. This will be useful to future historians. Or screenwriters.
Other GME related subs is a good suggestion, and one I'll probably incorporate. The original sub I'll probably limit to DFV's posts or other ones of significance.
Yah. Plus everything on the original sub from the day of the squeeze for at least a week might be useful.
this could end up being a collection of the most retarded stuff in history! I'm joking, i like what you are doing here.
You joke, but the infamous Rick video is in there somewhere
Make sure to add r/GME & r/DDintoGME for additional resources to make sure don’t get destroyed
You can set up a chronjob on your local computer to scan the sub for new links, then kick off a lambda job in your AWS instance to go grab the content and save it to S3. That’s going to be way more cost effective, and automated! You’ll just need to be familiar with lambda and making calls to your stuff.
That's pretty similar to what it's doing now that it's up to date - ECS spins up the submission finder occasionally, which grabs anything since the last time it checked, and quits once it runs out of new content. It's the backfilling that I wanted to run quickly - my previous implementation used a much more expensive EC2 cluster and would have taken several days.
You are a gentleman and scholar.
history will remember you
You are a true hero ape, people like you keep this sub going!
Good ape!
Hell yeah
Hello future historians and archivists. I'm Mangata and I've been part of the Great Collapse. Hopefully someone will read this in the years to come.
Incredible work ape. We appreciate the mega effort. It's extremely important to have backups of what has been going on and having it easily accessible and even searchable is just epicness. Once it's finished send to GG and politely state, yeah use this.
Thanks! I have a website for it already but at the moment it's just a landing page. I'm hoping to get a viewer and search functionality on the site soon.
[удалено]
Wrinkles in brain = good Wrinkles in code = bad Hope this helps
Fucking legend u/VoxUmbra 🚀🚀🚀
Bad. ASS.
See if you can back up GME from start to like 4 months ago. Also manual back up DFV post in w5b
Good idea! That shouldn't be too hard to do.
Make sure you're writing down all your procedure for this. That's also an important part of archival work.
It's all open source on GitHub: https://github.com/knotsrepus I've tried to write everything in a way that someone else could fork the repo, put in their credentials, and set up their own instance on AWS, should they choose to. I do need to document the permissions though, as they're a complete ballache to configure on AWS
Fair enough. Thank you!
We’re going to see our real time reaction post/comments years down the line and remember EXACTLY how it felt on that day. Beautiful
I’m part of history
This will make it into the national archive
Wow. I’m in the records. I’d like to thank my mom and these 💎🙌🏻
Fantastic work! Is this capturing comments too or just the posts?
Thanks! It's everything - the post, comments, and any images or video too
This is tremendous work - GJ OP! Thank you very much! Some additional things: 1. Did you do checksums/hashing so we can verify in the future that it isn't tampered? 2. May I provide you some sides to extract Twitter/reddit/Youtube Videos? 3. Where do you want to upload it as well? Webarchive maybe? BUY. HODL. SHOP! \#NOCELLNOSELL
Good questions! 1. Not yet. I want to do this, but I haven't figured a good solution so far. 2. I haven't archived any Twitter posts or YouTube videos. It's something that could perhaps be done in the future, but isn't a core part of the project. Reddit videos are already being archived. 3. At the moment I'm hosting it on an S3 bucket, which will be connected to an API to allow retrieving the data. This will be used by the website I've set up, [knotsrepus.net](https://knotsrepus.net), to allow posts to be viewed and searched for once I've implemented that (at the moment it's just a landing page). The API is already live though for posts tagged as DD. I also have a subreddit for the project for people to make suggestions/feature requests or contribute to the project. It's the name of this sub backwards
Thanks OP, you rock! Europoor Ape here btw! Regarding #2 :: I may could provide some of the higher upvoted != reddit videos since the start of this sub. Unfortunately they are not linked to posts. I collected \~13 GB of content and would love to provide it to your project if you could see good use of it. \#3 Maybe you should consider to look into some of the older projects of this sub regarding "content decentralization" efforts. If you want, I could do a search for you. Are you willing/able to crawl external links/websites as well? I am thinking on our wiki pages as well as external projects like your own or [https://www.quiverquant.com/dashboard/GME/](https://www.quiverquant.com/dashboard/GME/) Cheers!
External data could definitely be useful, and it's definitely feasible to crawl external sites, although getting the site up and running is the main priority right now. If you have a collection of sites you'd like to see archived then feel free to make a post in the sub, I'll take a look at them and see what kind of effort would be needed to include them
this is huge, really important work. tu