Items are identified solely by primary keys, more like a file system than a traditional database
By default eventual consistency, also support strong consistency
Consistent performance no matter at what scale
Primary Key (PK)
2 types of PKs
Partition Key (Hash Key), an attribute whose value is used to calculate a hash thus sending data to different partitions of the distributed storage
Composite Key (Hash Key + Range / Sort Key), in addition to Partition Key, sort the data on that partition with a second key
PK of an item must be unique to the table
For composite key, the the combination of both values must be unique, but either one of them may repeat
PKs are the only required attributes, all other attributes are optional and up to the needs of user
PK cannot be changed after an item is inserted, you can only create another item with different PK
Maximum of 2 attributes can be used in a PK (Composite Key), to ensure uniqueness and quick access on more than 2 attributes, try combine multiple attributes into one
Secondary Indexes (SIs)
2 types of SIs
Local SI, share Partition Key but choose another Sort Key
Global SI, completely different set of PK
Do not enforce uniqueness like PKs
Local SI
Must be defined when the table is created, cannot add more after creation
Global SI
Can be defined any time
Just like creating another pointer table
Not sync with original table in real-time
DynamoDB Streams
Creating an SQS in background and send messages to it when items update in table
Global Tables
Creates replica in different regions
All replicas can read / write
When there are conflicts, last writer wins
Data is usually async’d within seconds
Does not guarantee strong consistency across regions
Based on DynamoDB Streams
DynamoDB Accelerator (DAX)
In-memory cache on an EC2 instance managed by DynamoDB
Components: cluster, nodes, subnet group
Write-through and eventual consistency
DAX client act as a hook to DynamoDB client so user does not need to change business code when using DAX
Minimum 3 nodes are recommended for high availability
Data Operations
APIs
By PK: GetItem, PutItem, UpdateItem, DeleteItem
By SI: Query, Scan
Read all items: Scan
Batch: BatchGetItem, BatchPutItem
Like calling GetItem with a loop
Transactional: TransactionRead, TransactionWrite
All or nothing
Query only searches items within a partition, so you can only query against a table or secondary index with composite key
You have to set a single value for the Hash Key, and optionally a value or a range for the Sort Key
Query can only return maximum of 1MB data
Capacity Units (CUs)
Data requests are measured by CUs
2 types of CUs
Read Capacity Unit (RCU)
Write Capacity Unit (WCU)
To read an item up to 4KB
transactional = 2 RCUs
strong consistency = 1 RCU
eventual consistency = 0.5 RCU
To write an item up to 1KB
transactional = 2 WCU
otherwise = 1 WCU
Reading / writing non-existent items still consume CUs
BatchGetItem internally does GetItem in a loop fashion, so it counts as multiple item reads thus calculate CUs as if you did multiple GetItem calls
Query is different from BatchGetItem as it counts as one single operation, so CUs are calculated by the total size of items returned, the number of items does not matter
Query 100 items of which total size is 400KB using eventual consistency will only consume 400KB / (4KB * 2) = 50 RCUs
Item Size
Item maximum size 400KB including attribute names, values and local secondary index key, projected attributes
Security
DynamoDB data is encrypted by default
3 options of encryption
DynamoDB manages the key, free
KMS manages the key automatically on behave of user, charged by KMS
KMS stores the key, but the key is manually managed by user, charged by KMS
Encryption and decryption of data are transparent to user
Backup
You can create on-demand backups manually or automatically using Lambda
When a backup is requested, DynamoDB sets a time marker and data are backed up asynchronously up to the state of the marked time
You can enable continuous backup for a table to be able to do a point-in-time recovery
Data retention is hard coded 35 days
You may restore a table to a point in time from 35 days to 5 minute before current time
Backup does not affect latency or performance
Backup and restore do not consume provisioned throughput
Backups are billed by storage size / time
Billing
2 Modes of Billing
Pay per request, pay for the total number of CUs used
Provisioned capacity, pay for provisioned CUs-per-second, exceed CUs are paid by actual usage
Architectural Considerations
DynamoDB uses 10GB partitions to distribute data, partition selection depends on the Hash Key