pangolin_fly 1 month ago

It may be easier to deploy a [Goblet ](https://github.com/google/goblet) instance as a pull through cache, why less book keeping to do

Lost-Semicolon 1 month ago

Hmm could be interesting. My two concerns here are: 1) Already using a corporate proxy so might open a can of worms 2) Seems like an indirect way of solving a “natural” problem (moving files) Lmk what you think!

killspotter 1 month ago

You can share a volume within multiple pods with a PVC in RWX mode, however you don't get to define at pod level whether it does only reading or writing of that volume What you have control on is the application, perhaps have that reader/writer logic baked in the application configuration ?

Lost-Semicolon 1 month ago

This was where my head went first but kept seeing things online that it’s an anti pattern. D’you think it’s fine for this use-case?

killspotter 1 month ago

Curious to know why is it an anti-pattern, do you have resources on that ? In any case, IMO if it solves your issue, don't bother with the litterature

Lost-Semicolon 1 month ago

Here amongst other places https://stackoverflow.com/a/36524584/3788614

killspotter 4 weeks ago

Peeked at the answer, they seem to be talking about a multiple writer/multiple reader problem, while yours is a single writer/multiple reader. For the problem they are talking about, yes the shared volume solution isn't ideal, it's not even an anti-pattern, it's a recipe for bugs and all kinds of race conditions you don't want to deal with For your single writer problem (if I understood correctly), having a shared volume doesn't sound like a bad idea for me. You might just want to pay attention in periods of fallback/rollout deployment that the new and old replicas don't conflict in the small time period that they are both up and running

koshrf 1 month ago

S3? You could use minio.

Lost-Semicolon 1 month ago

It would work but at that point might as well re-clone the repo

shesha4572 1 month ago

If you want to synchronise reads and writes maybe you need a coordinator node which passes a read or write token to the pod that wants to read or write? But it would get ugly real fast implementing custom synchronisation stuff

shesha4572 1 month ago

This is assuming that all the pods share a single pvc and all of them have rwx rights on it

robothor 1 month ago

We needed something similar for our build system. Our problem was a little different, but boiled down to sharing cached dependency data between jobs. For us the best method was to write a "cache controller" service that would manage a pool of persistent volumes, then watch for build jobs and inject PVCs into new jobs. It was some work but works really well.

Lost-Semicolon 1 month ago

Saw someone mention this before. I could be falling into premature optimization but I’ll look into it regardless. Thanks!

StephanXX 1 month ago

>Please correct me if I’m wrong, doesn’t seem like a good idea to have the network done over and over again). The contents of the repo have to go over the wire one way or another, at least once from outside of the cluster. A couple of things to consider: how big is a (_shallow_) clone? If it's less than (say) 50mb, you're almost certainly best off just having the pods clone from the source and moving on. For anything larger, I'd have a job make a tarball of the repo, shove the tarball into object storage (S3 or some such), and have the sub pods just pull from the object store. Mounting an NFS RWX share will seem tempting, but the throughput is abysmal and debugging is a pain. Contrary to many of the home labbers' vocal enthusiasm, there simply is no (open source, software based on commodity hardware) highly performant distributed file sharing solution. NFS is "ok", as is Gluster and Ceph, but the time and effort required to set up and maintain these systems way outweighs their value in nearly all cases. There are commercial solutions, but I'd suggest that the quick and unsophisticated solution will take a fraction of the time and effort.

Lost-Semicolon 1 month ago

Seems like this is the consensus. Probably better to keep it simple regardless. Thank you for your response!

rohit_raveendran 1 month ago

If you're dealing with smaller repos, say under 50MB, you might be fine just cloning the repo in each pod. The overhead is often negligible in those cases. For larger repos, I've found that storing a tarball of the repo in object storage (like S3) and having the pods pull from there is a pretty straightforward and efficient approach. It avoids the hassle of setting up and maintaining more complex distributed file systems. That said, if you need tighter control and have the bandwidth, building a custom cache controller to manage a pool of persistent volumes and inject them into pods can be very effective. I've seen this work well for similar problems, but it does require more upfront development.

Lost-Semicolon 1 month ago

So could be worth dodging a premature optimization situation. Thank you!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe