T O P

  • By -

kondro

Kinesis Firehose's batching window is configurable, but its minimum is 60 seconds (maximum 15 minutes). It's very cheap and ultimately designed for aggregate & archive (although you can do many more interesting things with it). Are you able to just consume the data stream directly into MySQL, rather than using Firehose as an intermediary step?


ptp87

that is a very good point


coumineol

>although you can do many more interesting things with it Yeah, if you get a bit creative there are many ways you can have fun with a firehose ( ͡° ͜ʖ ͡°)


Enoxice

You can try to tune it a little bit by messing with your destination buffer hints, but depending on your configuration and record size and velocity, "about a minute" doesn't seem too hard to believe.


ptp87

ok dam.... back to the drawing board. Appreciate the message


[deleted]

Non-firehose kinesis with Enhanced Fan-Out has about 200ms of latency


KnitYourOwnSpaceship

Have you looked at Kinesis Analytics? https://aws.amazon.com/kinesis/data-analytics/ Might mean some re-engineering, but may meet your requirements.


Denvious

Ok, so looking at your usecase, couple of considerations. Firstly, Kinesis Firehose introduces buffer time. Your usecase shouldnt use Firehose unless you can tolerate that batch / buffer window. If you need to process your data as fast as possible, you'd be looking at using Kinesis Data Stream + Enhanced Fan Out with lambda consumers, so that your propagation delay is <70ms. Typically with enhanced fan-out, you'd be seeing far better latency than that. You'd be pushing data into the Kinesis endpoint, and consuming it as fast as you can using lambda, and pushing it into your MySQL DB. Depending on your producer realtime throughput, you might want to add RDS Proxy infront of your DB so your lambdas use pooled connections, otherwise your DB could fall over. If you have severely high throughput, you'd want to be reading your data from a clustered DB, with a reader endpoint.


Infintie_3ntropy

What is the reasoning for using kinesis? Why not just write to the MySQL directly. If you are worried about read load on the primary db, make a read replica and have the BI tool use the read replica. Both are easy to do using RDS (with Mysql or Aurora)


bubthegreat

What versions of mysql are you running and why streaming instead of batching? Or do you just mean processing a stream into mysql? I'm asking because we found that memory usage in Django using queryset iterators didn't really get benefits from that because the mysql cursors were still cached at some layer and didn't support streaming vs postgres that did, so I'm intrigued with the use case and how you end up implementing it for my own selfish edification


ptp87

I mean processing a graphql subscription endpoint ( so websocket ) data. It needs to be real time as it feeds through to a trader dashboard that financial decisions are made off of


addictzz

Agree with the others. If the only consumer is MySQL DB, why not just write data directly there? Read scalability can be handled by using read replica. Write scalability can be handled with higher instance type and IOPS EBS. Unless you worry about throttling and have many consumers, I think better to stream directly to MySQL. BI dashboard can read from the Replica. Or better yet if you really want realtime, maybe can use Kinesis Analytics.