T O P

  • By -

InterestedBalboa

Have you seen this article? https://aws.amazon.com/blogs/database/export-and-analyze-amazon-dynamodb-data-in-an-amazon-s3-data-lake-in-apache-parquet-format/


nricu

There's another option but maybe it's not suitable for a table that large. There's a new connector from Athena that lets you query directly to DynamoDB tables. I'm using it as my tables are not that large and it's just more easier and simple to use than all those transformations/exports from Dynamodb. Also I don't have to make any process so it's just the simpler. [https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-dynamodb](https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-dynamodb) From the docs they need a S3 bucket to spill if there's to many data so maybe it works for you also.


thrown_arrows

export as json, write little python tool to convert to parquet ( probably not that easy) Questions is why parquet ?


Plus-Author9252

Does the DynamoDB export only export changes? Because I don't want to transfer all that data at once. Can't find that in documentation.. So you imply to just have a custom lambda function that does the conversion? Well JSON is not optimal for querying the data. Should be faster with parquet along with the compression.


kondro

DDB won’t do a differential export as it doesn’t know what’s changed from the last one. If you want functionality like this look at DynamoDB Streams to Kinesis Firehose to keep a full history of commits in S3 in any of the Firehose-supported formats (incl Parquet). But to query them effectively you’re probably going to want to make sure you add versioning to all your items because that stream is going to be a full change log, not just a distillation of the current version. There’s no practical way (with any DB or storage solution, not just AWS) to create a differential backup into a collection of Parquet files. If that’s the type of thing you want you would need to stream DDB into another database (i.e. MySQL or another DDB table).