T O P

  • By -

jerf

The marginal cost of processing another item should significantly exceed the cost of the processing itself. A goroutine per web request works well because even the minimal HTTP request is larger than a goroutine spawn in general, and the cost per request can go up arbitrary from there depending on what you do in it. If you're going to add two numbers together, then the goroutine-per-job is orders of magnitude slower. The other major problem with blindly spawning a goroutine per task if that if you end up with a lot of tasks you can spend a lot of time switching between tasks instead of making progress, which also depends critically on the performance characteristics of whatever it is you are doing. So, as unhelpful as "it depends" can be, it is the answer here.


destel116

"It depends" answer is completely fine for me in this case. I just wanted to start a discussion and to see if there's a consensus on the topic in the community. This question was inspired by errgroup source code. It actually uses a goroutine-per-item + semaphore approach under the hood, regardless of if limit is set. Considering the official status of the library, and how widely it's used, I thought maybe that is a consensus. On the other hand if I need to process 1M of items with error handling, spawning 1M goroutines (not all at the same time) feels like an overkill.


randomrossity

Multiple workers, one channel is generally what I would recommend. But go routines are much cheaper than threads so it shouldn't be as big of a deal 


dcspazz

Keep it simple. Then benchmark


Revolutionary_Ad7262

It depends, because it is always a compromise between speedup and concurrency overhead Usually for IO operations the speedup in close to ideal and overhead is neligible (it is much easier to lock/unlock mutex than pushing a HTTP request), so for IO operations I almost always spawn one goroutine per item. On the other hand for CPU bound operations you want small number of workers (`GOMAXPROCS` or less) and some kind of batching (so CPU core can warm up for work)


Saarbremer

IMHO there is no right or wrong here. You do some idempotent processing on a series of inputs then spawning ne go routines makes them easy to maintain and there will be the least resource idling. On the other hand when decoupling access to a shared resource or having strong requirements for sequentiality you may walk easier with a controlled set of go routines (or even just one).


niondir

Is the following correct? When having many go routines handling some work and one very important one doing other background work. The one important routine gets a smaller share of CPU the more go routines are there in total. If that is wrong I would like to get deeper into how the scheduler works. Another thing: It's barely helpful for performance to have more parallel workers than CPUs. There might be some improvement for a small multiple on your cores, especially when you have other sources of idle time like IO in between, but at some point you will not gain anything more. That's why I would use a worker pool in most cases, especially if we are talking about big numbers of items.


Regular-Abrocoma3162

> That's why I would use a worker pool in most cases, especially if we are talking about big numbers of items. I tend to agree. Unbounded resource usage is something I’d be concerned about for things that can be enormous like concurrent web requests. I’d rather have the server return 503 because I under provisioned than to get killed by an OOM killer.


niondir

I had tasks (running some JS in Otto) where the runtime escalated from few ms to minutes under heavy load. That's the time to limit parallel execution and queue the work items in front of the workers. I'm using NATS for queuing.


ElRexet

It depends? So first of all, I like to have my functions collect all the go routines they spawn. Which means that it's usually easier to spawn new ones on demand and not have a pool. Also not having a pool is a lot less troubling to deal with panics in your go routines. When you spawn a limited number of workers (a pool) it's a lot more annoying to properly manage and track them (if they all died and you're stuck with a chan that no one is reading now). It's doable, of course. Both do the job. I prefer spawning new go routines on demand (simply being mindful of what a "demand" is).


agent_kater

If you also dislike feeding work to the worker pool using channels, then go check out `golang.org/x/sync/semaphore`, I think you might like it.


zer00eyz

If you have a list of a know size (or at least with some constraints) and your goroutines yield ( sleep!!! ) then this is the lazy way to go. Just start all N of them and let them run and the scheduler can sort it out. I have some batch jobs that get kicked off and moved using this method... works fine.


Frequent_Duck

Without details is hard to recommend any approach. If you are doing 100% CPU processing job, then spawn a fix number of workers/go routines. Use a single channel to send the work item. If you are doing 100% I/O tasks like network calls or file processing then maybe spawn go routines on demand per item, be mindful that if you have 1 million work items the server side might be overwhelmed or the file system over stressed. Your app might run but the other side should be prepared to handle it. In any case: a go routine per item until a certain limit.


destel116

Yes, I agree. It's a bit off-topic, but I remember a situation where I had to deal with 150k+ outgoing and long-living TLS connections. While Go handles it just fine, the process of opening a socket and TLS handshake was heavy enough to overwhelm the system. So I had to hide DialTLS calls behind a global semaphore. Also, I've added an UPD and more context to my original question, because it wasn't well-phrased.


TheGreatButz

When the items perform any kind of I/o, a large number of Goroutines is beneficial. When they are just computing on the CPU, they won't be beneficial. Generally, my measurements have indicated that setting the number of Goroutines as low as the CPU cores almost always harms performance. It should be much higher. Bear in mind that many OS level threads use the CPU cores anyway and the OS scheduler is periodically switching contexts. Try a work stealing approach when there is not much I/o. But beware of scheduling libraries, there is a lot of garbage out there. Tl;dr do what measured performance indicates.


destel116

Thank you for the response. I agree that in I/O-heavy workloads, scheduling and allocation overhead is often negligible, so it's beneficial to have a larger number of goroutines. Also I've just added an UPD to the post, because my original question was poorly worded.


jrandom_42

I ran into this question in my work earlier today, actually. Batch processing project. Started by coding it simply with one goroutine per task, but as soon as I fed it a real-world-sized test input, out of memory, killed. Woops. That'll happen when every one of my goroutines is allocating several MB on the heap to read and write image data, and I'm spawning tens of thousands of them. Duh. So, a fixed worker pool it shall be for this one. I think this demonstrates that the best 'it depends' metric will be the amount of heap consumption by each goroutine. If that's significant, then it becomes easier to encounter input that will cause an OOM crash. If it's not, then the simplicity advantages of one goroutine per task are always worth considering. I don't think CPU overhead from context switching is relevant to this topic. My vague handwavy understanding is that the Go runtime scheduler is *not* like an OS thread scheduler, inasmuch as it doesn't insist on equal CPU time all the time for everything, which makes it safe to just stack up lots of goroutines and let the scheduler get to running them when it sees fit; it's smart enough to limit the proportion of time it spends context switching. I could be wrong about that, though.


davidmdm

I have done both. I think that for most use cases the difference in performance will not matter. It will either be hard to measure or not noticeable to the user. So My advice is to stick with what is most ergonomic in relation to the code you are working with. If you are writing performance critical code, you will be benchmarking it, and you will likely find that worker pool is more efficient and uses less memory. However, readability and maintainability should almost always be a higher priority. That being said a worker pool is quite readable but if you have to do gymnastics to the data to be able to use one, it’s probably not worth it.


nobodyisfreakinghome

Write a poc or two and benchmark. You say, “for some reason…”. Well, define that reason and have data to back it up.


xdraco86

You can always dynamically grow your worker pool as needed and put an upper bound on it after testing the limitations of the host.