3 fields composite primary key (unique item) in Dynamodb
I am trying to create a table to store invoice line items in DynamoDB. Let's say the item is defined by CompanyCode
, InvoiceNumber
and LineItemId
, amount and other line item details.
A unique item is defined by the combination of the first 3 attributes. Any 2 of those attributes can be same for the different items. What should I select as the Hash Attribute and the Range Attribute?
Solution 1:
Some Intro
For efficiency I would propose totally different design. With NoSQL databases (and DynamoDB is not different) we always need to consider the access patterns first. Also, if possible we should strive to fit all our data within same table and several indexes. From what we have from OP and his comments, these are the two access patterns:
- For a company X, get complete invoice Y (including all items or range of items) [based on this comment ]
- Get all invoices for company X [ based on this comment ]
We now wonder what is a good Primary Key? Translates to question what is a good Partition Key (PK) and what is a good Sort Key (SK) and which secondary indexes do we need to create and of what kind (local or global)? Some reminders:
- Primary Key can be on one column or composite
- Composite primary key consists of Partition Key and Sort Key
- Partition key is used as input to the hashing function that will determine partition of the items
- Sort key can also be composite, which allows us to model one-to-many relationships in DynamoDB as given in one of the comments links: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
- When creating query on the table or index, you always need to use '=' operator on the Partition Key
- When querying ranges on Sort Key you have option for
KeyConditionExpression
which provides you with set of operators for sorting and everything in between (one of them being functionbegins_with (a, substr)
) - You are also allowed to use
FilterExpression
if you need to further refine the Query results (filter on the projected attributes) - Local Secondary Indexes (LSI) have same Partition Key but different Sort Key than your original table and give you different view of your data, organized according to an alternative Sort Key
- Global Secondary Indexes (GSI) have different Partition Key and different Sort Key than your original table and give you completely different view on data
- All items with the same partition key are stored together, and for composite Primary keys, are ordered by the sort key value. DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.
Back To Modeling
It is obvious that we are dealing with multiple entities that need to be modeled and fit into the same table. To satisfy condition of Partition Key being unique on the table, CompanyCode
comes as a natural Partition Key - so I would ensure that is unique. If not then you need to ask yourself how can you model the second access pattern?
Assuming we have established uniqueness on the CompanyCode
let's simplify and say that it comes in the form of an e-mail (or could be domain or just a code, but I will use email for demonstration).
- Relationship between Company and Invoices is always 1:many.
- Relationship between Invoice and Items is always 1:many.
I propose design as in the image below:
- With PK being
CompanyCode
and SK beingInvoiceNumber
can store all attributes about that invoice for that company. - Nothing prevents me to also add record where the SK is
Customer
which allows me to store all attributes about the company. - With GSI1 , we will create reverse lookup where GSI1PK is my tables SK (
InvoiceNumber
) and my GSI1SK is my tables PK (CompanyCode
). - I am using same table to store line items with PK being
LineItemId
and SK beingCompanyCode
(still unique) - For Item entity items my GSI1PK is still
InvoiceNumber
and my GSI1SK isLineItemId
which is tables PK so its same as for Invoice entity items.
Now the access patterns supported with this:
- If I want to get invoice Y for company X and all the items (access pattern 1): Query the table where
CompanyCode=X
and useKeyConditionExpression
with=
operator on the Sort KeyInvoiceNumber
. If I want to get all the items tied to that invoice, I will projectItems
attribute usingProjectionExpression
. - By retrieving all the items with previous query for company X and invoice Y, I can now run
BatchGetItem
API call (using my unique composite keyLineItemId+CompanyCode
) on table to get all items belonging to that particular invoice of that particular customer. (this comes with some constraints of BatchGetItem API) - To support access pattern 2, I will do a query with
CompanyCode=X
on PK and useKeyConditionExpression
on the SK withbegins_with (a, substr)
function/operator to get only invoices for company X and not the metadata about that company. That will give me all invoices for given company/customer. - Additionally, with above GSI1, for any given
InvoiceNumber
I can easily select all the line items that belong to that particular invoice. REMEMBER: The key values in a global secondary index do not need to be unique - so in my GSI1 I could have had easily invoice_1 -> (item_1, item_2) and then another invoice_1 -> (item_1,item_2) but the difference between two items in GSI would be in the SK (it would be associated with differentCompanyCode
(but for demonstration purposes I used invoice_1 and invoice_2).
Solution 2:
I believe the first option offered by @georgeaf99 won't work, because if you do it that way, then CompanyCode
has to be unique in the table. Therefore, there would only be one item allowed per company. I think the second solution is the only real way to do it.
You can use CompanyCode
as the Hash Key, and then all other fields that combine to make the item unique (in this case InvoiceNumber
and LineItemId
) need to be somehow combined into one value (such as concatenation with a field delimiter), which would be your Range Key. Unfortunately that is kind of ugly, but that's the nature of a NoSQL database like DynamoDB. However, it will allow you to successfully store records with the correct uniqueness. When reading the records back, if you don't want to parse the combined field back out to its individual parts, then you'll have to add additional separate fields for InvoiceNumber
and LineItemID
.
If you don't have a large number of invoices per company, you can query by only the Hash Key and do the filtering on the client side. If you have a large number of invoices per company and need to be able to query only the items for a single invoice, then I would create a secondary index on CompanyCode and InvoiceNumber.