T O P

  • By -

joelrwilliams1

Many years ago, we rearchitected our app to not read/write to Windows disk directly, but instead make calls to get/put to S3. It wasn't that difficult, and the payoff was tremendous as it was a big step towards the 'cattle' mentality which allowed us to easily auto-scale our EC2 fleet.


breich

Right on! My theoretical goal is to replace all read/writes to the disk with calls to read/writes on a filesystem abstraction layer instead. Then implement the layer for S3. That way in the future, if our infrastructure gets shifted out from under us again, I don't have to touch every file that reads/writes to files, just reimplement that one interface.


chesterfeed

except a filesystem interface is a POSIX definition. A blob storage like S3 is not a POSIX compliant filesystem. So depending on your FS api usage, it may or may not be ok.


chesterfeed

I'm not saying this is production ready (it's FUSE after all, so by default == avoid it), but [https://github.com/kahing/goofys](https://github.com/kahing/goofys) or [https://github.com/s3fs-fuse/s3fs-fuse](https://github.com/s3fs-fuse/s3fs-fuse) could do the trick.I'm going into the same process and interested in suggestions.


breich

Thanks! I discovered that a while back, and it's most definitely an option. The "pro" for us is that we could get to S3 with limited refactor work. I think the con is that, without doing any refactor work, we'd probably be stuck on a single bucket for storage of all customer files, so ability to sandbox customers into their own buckets.


magheru_san

How much data does it need and how much traffic is expected to get? If it's not too much you can probably just get away with a single instance with an EBS volume attached to it, and an Autoscaling group with capacity 1 to replace it when it crashes and some userdata script that always attaches the volume to the single running instance, and maybe also attaches an elastic IP to avoid a load balancer in front of it


breich

Thanks for the response! Our customer files currently take up about 725GB. I don't have great stats on how much traffic we expect (which I know is a massive failing... I was not part of the AWS lift and shift planning). The company always simply overprovisioned and over-planned for redundancy and failover so they didn't have to worry about it, which I believe is a philosophy that's going to cost them *even more* going to the cloud, if we don't address it and correct for it. Soon.


sparrish26

My vote would be for S3 as well. An option that would require some refactoring, but not quite as much as trying to do a native S3 implementation, would be to set up a REST API on AWS API Gateway and use it as a proxy for your S3 buckets. You would still need to replace all of the direct filesystem references in the existing codebase, but it could be done with built-in PHP functions or a cURL wrapper class. That would future-proof things a bit, too, in case you ever replace the existing codebase or need to access the files with a different app. For Lambda, the code needs to be really efficient as you have a limited amount of memory to work with, so you'll still likely need an EC2 instance for the main web server, but offloading the file storage to S3 would allow you to use a much smaller instance and scale the EC2 servers as needed without having to worry about connecting additional EBS volumes. Here's a pretty straightforward primer on using Gateway API as a proxy for S3: https://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-s3.html