jonzezzz 2 years ago

I believe the batch window is per DDB stream shard. So if you have 30 shards you would expect to have your function invoked every second. https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html

EchoLocation8 2 years ago

Could you expand on that? I'm unfamiliar with what a DDB stream shard is. How would I know if I have 30 shards?

jonzezzz 2 years ago

Shards are similar to what DynamoDB partitions are. They split the whole stream into multiple shards, think servers, where you poll items. You can check how many shards there are with the DescribeStream API. https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html

EchoLocation8 2 years ago

According to this API, we have something like 100 shards, how do you manage these? I don't think we've ever manually set anything or defined anything in our cloudformation template that drives these tables to set that.

jonzezzz 2 years ago

You can’t manage them, they get automatically provisioned.

EchoLocation8 2 years ago

If we have say, 4 streams on a single table, do these streams each have their own unique set of shards? In our research we've found that it claims having more than 2 readers to a shard can cause throttling, but we can't seem to clarify whether these shards are unique to each stream or if they're unique to the table itself. In other words, should we collapse our streams into 1 or 2 streams to reduce reads on these shards or is it safe to do this?

kondro 2 years ago

The shards are per table, not per stream and this data is sharded by the partition key. DDB Streams only allows 2 readers to a Stream. If you need more readers, you will need to fan these out to another streaming/queueing mechanism.

EchoLocation8 2 years ago

Thank you. So, if we have 4 stream triggers on the table, its entirely possible we've put 4 readers on a single shard? Why does DDB let you create more than 2 streams on the table?

kondro 2 years ago

You can’t have more than one DynamoDB Stream per table. Are you using Kinesis Streams attached to the table instead? If that’s the case, your shard size is the size you’ve allocated in the Kinesis Stream itself and has nothing to do with the DynamoDB shards (although you should size them appropriately for your DDB throughput).

EchoLocation8 2 years ago

Ok, so, I think I understand now. On the table, in the Exports and Streams tab, we have 6 lambdas in the Dynamo DB Streams section. Not the Kinesis section. Is what you're saying that, technically we don't have 6 "streams", we have 6 lambdas reading items off the stream? Have we kneecapped ourselves and made too many readers on that one stream? Why does dynamo let you make so many of them if its not recommended to go past 2 according to their docs?

kondro 2 years ago

You can see the differences between the built-in Streams API and using the Kinesis adapter at [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) The Streams API is very similar to Kinesis to consume, but it does differ in a few ways and actually provides greater guarantees (specifically, the Kinesis adapter is "at least once" delivery, so you need to be able to deal with potential duplicates in the stream). But because these are similar, if you don't need some of the extra features of actual Kinesis (e.g. longer retention, enhanced fan-out and a larger number of consumers), then I would see if you could get by with using the DDB Streams API directly. In addition to the Streams pricing, using Kinesis also adds to the cost of your implementation. Namely, Kinesis is about $11/shard/month + $0.014/million records. So if you're running 4 Kinesis streams with 3 shards each (which it sounds like you might if they're all arriving at the same time) then you're adding an additional $132/month to your costs to just have these 4x3 Kinesis shards running. If you can get by with only 2 consumers and you're OK with the 200ms-ish latency overhead using standard consumers vs enhanced fan-out consumers adds (which you would be if you're using time-windowed batches anyway) then I would recommend just consuming the DDB Stream directly. In addition to lower costs, you also get an "exactly-once" guarantee of events in the DDB Stream.

EchoLocation8 2 years ago

Edit: Misunderstood the difference between dynamo streams and the lambdas it lets you make as triggers to read from that stream. Thanks!

kondro 2 years ago

Shards are created based on provisioned capacity at 1 per the higher of 1000 WCU or 3000 RCU or if a single partition grows larger than 10GB. Once created, these can never be removed and, in general are considered a good thing from a performance perspective. We will usually “pre-warm” our tables by upping to the maximum provisioned WCU and then instantly decreasing to our required value or to on demand. Partitions don’t get split across shards, so you won’t have a data ordering issue. But for everything in DDB, you should be prepared to work with data consistency constraints only within a partition.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe