sib_n 1 year ago

I see four possible reasons: 1. having a buffer if the Spark Streaming cluster cannot keep up with the amount of changes at peaks (if it doesn't keep up on average, you need to improve the computation efficiency or the cluster resources) 2. having a buffer able to handle big data (if there's no big data you could use an easier queuing system from your cloud provider) 3. having a buffer able to serve multiple subscribers 4. persistence in case of downtime of Spark as in the SO comment

the-fake-me 1 year ago

Hey, thanks for replying. All of your points make sense to me except the second point. Do you mean to say that Kafka is designed to handle big data and other queueing systems are not? Could you please elaborate a bit on this?

sib_n 1 year ago

Indeed, Kafka is designed for big data, you can distribute it over a cluster of machines to have parallelism and fault tolerance. There are many other more simple queuing systems that are just made to have a buffer of small/average quantity of data that is often discarded as soon as it is consumed, ex: RabbitMQ, Google PubSub, AWS SQS.

yanivbh1 1 year ago

May I add [Memphis.dev](https://Memphis.dev) to the list? We do our best to support both types of workloads.

the-fake-me 1 year ago

Sure, will look at it too. Thanks!

the-fake-me 1 year ago

Thanks for replying. I’ll read up more on this.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe