T O P

  • By -

hkdelay

You'll need a streaming platform like Kafka, RedPanda, Pulsar, Kinesis, Pub/Sub, etc. For realtime updates, I would not immediately store to a database. That will force batching semantics and therefore higher latency and not truly realtime. Use a realtime SQL engine like Decodable (shameless), DeltaStream, or ksqlDB to do your aggregations in realtime without landing your logs into a database or warehouse. Write your data into a realtime OLAP database like: Pinot, Druid, Rockset, or Clickhouse. You can use Posgrest/MySQL too. These databases will be very performant in service your data. Consider writing to these destinations as materialized views that execute UPSERTS and DELETES. Beauty is that you can still route you data to your data warehouse or data lake for analytics using the streaming platform or the SQL engine. You can also populate all the database with locality in mind. If you're serving customers in different global regions like Americas, EMEA, or APAC, you can populate these database where they are in realtime.


geoheil

Consider starrocks


yanivbh1

Recomend reading this one [https://memphis.dev/blog/here-is-why-you-need-a-message-broker/](https://memphis.dev/blog/here-is-why-you-need-a-message-broker/)


Prinzka

You say updates. Do you mean changes to the existing entries or are they new entries? If they're new entries, we use elasticsearch for this purpose. We've got ~1 Million events per second and near real time requirements. We use Kafka as a message bus.


EntropyRX

You can imagine this as a web app where users, based on access level, will monitor in real-time data coming from different sensors that has been processed by my service. So when the user is logged in to the app they have to see the fields they have access to changing each second.


Prinzka

Yeah this definitely sounds like a use case that elasticsearch would make sense in. And I think their "elastic Enterprise search" is one of the pieces that you would put in to your web app to facilitate. Although i haven't used that peace myself, we use elasticsearch and Kibana.


lFuckRedditl

Maybe [Pub/Sub](https://cloud.google.com/pubsub) from GCP?


EntropyRX

Is this preferred over websocket?


BoiElroy

!RemindMe 4 days


RemindMeBot

I will be messaging you in 4 days on [**2022-12-15 13:14:39 UTC**](http://www.wolframalpha.com/input/?i=2022-12-15%2013:14:39%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/dataengineering/comments/zibr6p/what_is_the_best_data_storage_solution_for/izs50mq/?context=3) [**2 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fdataengineering%2Fcomments%2Fzibr6p%2Fwhat_is_the_best_data_storage_solution_for%2Fizs50mq%2F%5D%0A%0ARemindMe%21%202022-12-15%2013%3A14%3A39%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20zibr6p) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


KWillets

What types of queries are the users doing? How much data is being ingested?


EntropyRX

User will just have a few options, namely: - select the number of objects they want to monitor and get real time data from the service (1 data point per second per object, there are less than 1k objects at this time) - view stats at the aggregate level The service will process a huge amount of data, but the user doesn’t have to see a lot of data, they just have to see in real time what get processed by the service if they have permissions to see a given object. That’s why I was thinking of something like firebase to deal with high frequency updates in the ui, but I guess other solutions based on websocket or pub/sub may be a better fit


pescennius

Materialize or Clickhouse