DynamoDB Cheat Sheet

Overview

  • Serverless, distributed, NoSQL
  • Items are identified solely by primary keys, more like a file system than a traditional database
  • By default eventual consistency, also support strong consistency
  • Consistent performance no matter at what scale

Primary Key (PK)

  • 2 types of PKs
    • Partition Key (Hash Key), an attribute whose value is used to calculate a hash thus sending data to different partitions of the distributed storage
    • Composite Key (Hash Key + Range / Sort Key), in addition to Partition Key, sort the data on that partition with a second key
  • PK of an item must be unique to the table
    • For composite key, the the combination of both values must be unique, but either one of them may repeat
  • PKs are the only required attributes, all other attributes are optional and up to the needs of user
  • PK cannot be changed after an item is inserted, you can only create another item with different PK
  • Maximum of 2 attributes can be used in a PK (Composite Key), to ensure uniqueness and quick access on more than 2 attributes, try combine multiple attributes into one

Secondary Indexes (SIs)

  • 2 types of SIs
    • Local SI, share Partition Key but choose another Sort Key
    • Global SI, completely different set of PK
  • Do not enforce uniqueness like PKs

Local SI

  • Must be defined when the table is created, cannot add more after creation

Global SI

  • Can be defined any time
  • Just like creating another pointer table
  • Not sync with original table in real-time

DynamoDB Streams

  • Creating an SQS in background and send messages to it when items update in table

Global Tables

  • Creates replica in different regions
  • All replicas can read / write
    • When there are conflicts, last writer wins
  • Data is usually async’d within seconds
  • Does not guarantee strong consistency across regions
  • Based on DynamoDB Streams

DynamoDB Accelerator (DAX)

  • In-memory cache on an EC2 instance managed by DynamoDB
  • Components: cluster, nodes, subnet group
  • Write-through and eventual consistency
  • DAX client act as a hook to DynamoDB client so user does not need to change business code when using DAX
  • Minimum 3 nodes are recommended for high availability

Data Operations

APIs

  • By PK: GetItem, PutItem, UpdateItem, DeleteItem
    • By SI: Query, Scan
    • Read all items: Scan
  • Batch: BatchGetItem, BatchPutItem
    • Like calling GetItem with a loop
  • Transactional: TransactionRead, TransactionWrite
    • All or nothing
  • Query only searches items within a partition, so you can only query against a table or secondary index with composite key
    • You have to set a single value for the Hash Key, and optionally a value or a range for the Sort Key
    • Query can only return maximum of 1MB data

Capacity Units (CUs)

  • Data requests are measured by CUs
  • 2 types of CUs
    • Read Capacity Unit (RCU)
    • Write Capacity Unit (WCU)
  • To read an item up to 4KB
    • transactional = 2 RCUs
    • strong consistency = 1 RCU
    • eventual consistency = 0.5 RCU
  • To write an item up to 1KB
    • transactional = 2 WCU
    • otherwise = 1 WCU
  • Reading / writing non-existent items still consume CUs
  • BatchGetItem internally does GetItem in a loop fashion, so it counts as multiple item reads thus calculate CUs as if you did multiple GetItem calls
  • Query is different from BatchGetItem as it counts as one single operation, so CUs are calculated by the total size of items returned, the number of items does not matter
    • Query 100 items of which total size is 400KB using eventual consistency will only consume 400KB / (4KB * 2) = 50 RCUs

Item Size

  • Item maximum size 400KB including attribute names, values and local secondary index key, projected attributes

Security

  • DynamoDB data is encrypted by default
  • 3 options of encryption
    • DynamoDB manages the key, free
    • KMS manages the key automatically on behave of user, charged by KMS
    • KMS stores the key, but the key is manually managed by user, charged by KMS
  • Encryption and decryption of data are transparent to user

Backup

  • You can create on-demand backups manually or automatically using Lambda
    • When a backup is requested, DynamoDB sets a time marker and data are backed up asynchronously up to the state of the marked time
  • You can enable continuous backup for a table to be able to do a point-in-time recovery
    • Data retention is hard coded 35 days
    • You may restore a table to a point in time from 35 days to 5 minute before current time
  • Backup does not affect latency or performance
  • Backup and restore do not consume provisioned throughput
  • Backups are billed by storage size / time

Billing

  • 2 Modes of Billing
    • Pay per request, pay for the total number of CUs used
    • Provisioned capacity, pay for provisioned CUs-per-second, exceed CUs are paid by actual usage

Architectural Considerations

  • DynamoDB uses 10GB partitions to distribute data, partition selection depends on the Hash Key

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s