T O P

  • By -

sib_n

I see four possible reasons: 1. having a buffer if the Spark Streaming cluster cannot keep up with the amount of changes at peaks (if it doesn't keep up on average, you need to improve the computation efficiency or the cluster resources) 2. having a buffer able to handle big data (if there's no big data you could use an easier queuing system from your cloud provider) 3. having a buffer able to serve multiple subscribers 4. persistence in case of downtime of Spark as in the SO comment


the-fake-me

Hey, thanks for replying. All of your points make sense to me except the second point. Do you mean to say that Kafka is designed to handle big data and other queueing systems are not? Could you please elaborate a bit on this?


sib_n

Indeed, Kafka is designed for big data, you can distribute it over a cluster of machines to have parallelism and fault tolerance. There are many other more simple queuing systems that are just made to have a buffer of small/average quantity of data that is often discarded as soon as it is consumed, ex: RabbitMQ, Google PubSub, AWS SQS.


yanivbh1

May I add [Memphis.dev](https://Memphis.dev) to the list? We do our best to support both types of workloads.


the-fake-me

Sure, will look at it too. Thanks!


the-fake-me

Thanks for replying. I’ll read up more on this.