本文介绍了Dynamodb 中的 3 个字段复合主键(唯一项)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个表来在 DynamoDB 中存储发票行项目.假设项目由CompanyCodeInvoiceNumberLineItemId、金额和其他行项目详细信息定义.

唯一的项目由前 3 个属性的组合定义.对于不同的项目,这些属性中的任何 2 个都可以相同.我应该选择什么作为哈希属性和范围属性?

解决方案

Some Intro

为了效率,我会提出完全不同的设计.对于 NoSQL 数据库(DynamoDB 也不例外),我们总是需要首先考虑访问模式.此外,如果可能,我们应该努力将所有数据放入同一个表和多个索引中.根据我们从 OP 和他的评论中获得的信息,以下是两种访问模式:

  1. 对于 X 公司,获取完整的发票 Y(包括所有项目或项目范围)[基于此

    • PK 为 CompanyCode,SK 为 InvoiceNumber 可以存储该公司发票的所有属性.
    • 没有什么可以阻止我在 SK 是 Customer 的地方添加记录,这允许我存储关于公司的所有属性.
    • 使用 GSI1 ,我们将创建反向查找,其中 GSI1PK 是我的表 SK (InvoiceNumber),而我的 GSI1SK 是我的表 PK (CompanyCode).
    • 我使用同一个表来存储行项目,PK 为 LineItemId,SK 为 CompanyCode(仍然唯一)
    • 对于项目实体项目,我的 GSI1PK 仍然是 InvoiceNumber,而我的 GSI1SK 是 LineItemId,它是表 PK,因此它与发票实体项目相同.

    现在支持的访问模式:

    • 如果我想获得 X 公司和所有项目的发票 Y(访问模式 1):查询 CompanyCode=X 所在的表并使用 KeyConditionExpression>= 排序键 InvoiceNumber 上的运算符.如果我想获得与该发票相关联的所有项目,我将使用 ProjectionExpression 投影 Items 属性.
    • 通过检索先前查询公司 X 和发票 Y 的所有项目,我现在可以运行 BatchGetItem API 调用(使用我唯一的复合键 LineItemId+​​CompanyCode)表以获取属于该特定客户的该特定发票的所有项目.(这带有 BatchGetItem API 的一些限制)
    • 为了支持访问模式 2,我将在 PK 上使用 CompanyCode=X 进行查询,并使用 begins_with (a, substr) 在 SK 上使用 KeyConditionExpression 函数/运算符只获取 X 公司的发票,而不是该公司的元数据.这将为我提供给定公司/客户的所有发票.
    • 此外,使用上述 GSI1,对于任何给定的 InvoiceNumber,我可以轻松选择属于该特定发票的所有行项目.记住: 全球二级索引中的关键值不需要是唯一的 - 所以在我的 GSI1 中,我可以很容易地使用 invoice_1 -> (item_1, item_2) 然后另一个 invoice_1 -> (item_1,item_2) 但是 GSI 中的两个项目之间的区别是SK(它将与不同的CompanyCode 相关联(但出于演示目的,我使用了 invoice_1 和 invoice_2).

    I am trying to create a table to store invoice line items in DynamoDB. Let's say the item is defined by CompanyCode, InvoiceNumber and LineItemId, amount and other line item details.

    A unique item is defined by the combination of the first 3 attributes. Any 2 of those attributes can be same for the different items. What should I select as the Hash Attribute and the Range Attribute?

    解决方案

    Some Intro

    For efficiency I would propose totally different design. With NoSQL databases (and DynamoDB is not different) we always need to consider the access patterns first. Also, if possible we should strive to fit all our data within same table and several indexes. From what we have from OP and his comments, these are the two access patterns:

    1. For a company X, get complete invoice Y (including all items or range of items) [based on this comment ]
    2. Get all invoices for company X [ based on this comment ]

    We now wonder what is a good Primary Key? Translates to question what is a good Partition Key (PK) and what is a good Sort Key (SK) and which secondary indexes do we need to create and of what kind (local or global)? Some reminders:

    • Primary Key can be on one column or composite
    • Composite primary key consists of Partition Key and Sort Key
    • Partition key is used as input to the hashing function that will determine partition of the items
    • Sort key can also be composite, which allows us to model one-to-many relationships in DynamoDB as given in one of the comments links: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
    • When creating query on the table or index, you always need to use '=' operator on the Partition Key
    • When querying ranges on Sort Key you have option for KeyConditionExpression which provides you with set of operators for sorting and everything in between (one of them being function begins_with (a, substr) )
    • You are also allowed to use FilterExpression if you need to further refine the Query results (filter on the projected attributes)
    • Local Secondary Indexes (LSI) have same Partition Key but different Sort Key than your original table and give you different view of your data, organized according to an alternative Sort Key
    • Global Secondary Indexes (GSI) have different Partition Key and different Sort Key than your original table and give you completely different view on data
    • All items with the same partition key are stored together, and for composite Primary keys, are ordered by the sort key value. DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.

    Back To Modeling

    It is obvious that we are dealing with multiple entities that need to be modeled and fit into the same table. To satisfy condition of Partition Key being unique on the table, CompanyCode comes as a natural Partition Key - so I would ensure that is unique. If not then you need to ask yourself how can you model the second access pattern?

    Assuming we have established uniqueness on the CompanyCode let's simplify and say that it comes in the form of an e-mail (or could be domain or just a code, but I will use email for demonstration).

    • Relationship between Company and Invoices is always 1:many.
    • Relationship between Invoice and Items is always 1:many.

    I propose design as in the image below:

    • With PK being CompanyCode and SK being InvoiceNumber can store all attributes about that invoice for that company.
    • Nothing prevents me to also add record where the SK is Customer which allows me to store all attributes about the company.
    • With GSI1 , we will create reverse lookup where GSI1PK is my tables SK (InvoiceNumber) and my GSI1SK is my tables PK (CompanyCode).
    • I am using same table to store line items with PK being LineItemId and SK being CompanyCode (still unique)
    • For Item entity items my GSI1PK is still InvoiceNumber and my GSI1SK is LineItemId which is tables PK so its same as for Invoice entity items.

    Now the access patterns supported with this:

    • If I want to get invoice Y for company X and all the items (access pattern 1): Query the table where CompanyCode=X and use KeyConditionExpression with = operator on the Sort Key InvoiceNumber. If I want to get all the items tied to that invoice, I will project Items attribute using ProjectionExpression.
    • By retrieving all the items with previous query for company X and invoice Y, I can now run BatchGetItem API call (using my unique composite key LineItemId+CompanyCode) on table to get all items belonging to that particular invoice of that particular customer. (this comes with some constraints of BatchGetItem API)
    • To support access pattern 2, I will do a query with CompanyCode=X on PK and use KeyConditionExpression on the SK with begins_with (a, substr) function/operator to get only invoices for company X and not the metadata about that company. That will give me all invoices for given company/customer.
    • Additionally, with above GSI1, for any given InvoiceNumber I can easily select all the line items that belong to that particular invoice. REMEMBER: The key values in a global secondary index do not need to be unique - so in my GSI1 I could have had easily invoice_1 -> (item_1, item_2) and then another invoice_1 -> (item_1,item_2) but the difference between two items in GSI would be in the SK (it would be associated with different CompanyCode (but for demonstration purposes I used invoice_1 and invoice_2).

    这篇关于Dynamodb 中的 3 个字段复合主键(唯一项)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 06:41