本文介绍了清除AWS Data Pipeline中DynamoDB表中的所有现有条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是获取RDS表的每日快照并将其放入DynamoDB表中.该表应仅包含一天的数据.

My goal is to take daily snapshots of an RDS table and put it in a DynamoDB table. The table should only contain data from a single day.

为此,设置了数据管道以查询RDS表并将结果以CSV格式发布到S3中.

For this have a Data Pipeline set up to query a RDS table and publish the results into S3 in CSV format.

然后,HiveActivity通过为文件和现有DynamoDB表创建外部表,将此CSV导入DynamoDB表中.

Then a HiveActivity imports this CSV into a DynamoDB table by creating external tables for the file and an existing DynamoDB table.

这很好用,但是DynamoDB表中仍然存在前一天的旧条目.我想尽可能在​​Data Pipeline中执行此操作.我需要:

This works great, but older entries from the previous day still exist in the DynamoDB table. I want to do this within Data Pipeline if at all possible. I need to:

1)找到清除DynamoDB表或至少删除/重新创建该表的方法,或者2)在快照日期的旁边增加一列,并找到清除所有旧条目的方法.

1) Find a way to clear the DynamoDB table, or at least drop/recreate it, or2) Include an extra column of the snapshot date and find a way to clear out all older entries.

关于如何做到这一点的任何想法?

Any ideas on how I can do this?

推荐答案

您可以使用DynamoDb 生存时间(TTL),它允许您设置过期时间,在此之后,将自动从DynamoDb表中删除项目.TTL对于在特定时间段后数据失去关联性的情况非常有用,在您的情况下,它可以是第二天的开始时间.

You can use DynamoDb Time to Live(TTL) which allows you to set an expiration time after which items are auto deleted from the DynamoDb table. TTL is very useful for cases where data loses it's relevance after a specific time period and in your case it can be start time of next day.

这篇关于清除AWS Data Pipeline中DynamoDB表中的所有现有条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:12