Redshift Cheat Sheet

Overview

  • A managed data warehouse
  • Based on ParAccel solution, in turn based on PostgreSQL

Node Types

  • DC2 = dense compute, SSD-backed, good for compute and I/O intensive workloads
  • DS2 = dense storage, HDD-backed, good for huge dataset
  • RA3 = good general purpose, recommended to replace DC2 and DS2
  • It is odd that AWS does not have a page dedicated to node types, some info is buried in pricing page

Resizing

  • Elastic resizing, can resize cluster and change node types, resizing done in minutes
  • Classic resizing, resizing done in hours and days, not recommended anymore

Distribution Styles

  • How data is distributed in cluster
  • Styles
    • AUTO = automatically select from below strategies
    • ALL = full copy of data in every node, ok for smaller, not-updated-often tables
      • No significant performance gains for smaller table
    • EVEN = distribute in round-robin fashion, good if no joins
    • KEY = distribute by selected column, good when joining by those columns

COPY Command

  • Use COPY command whenever you can, much more efficient than individual INSERTs
  • Use a single COPY command, with multiple files, to enable parallel COPYing
    • AVOID using multiple commands
    • AVOID using a single file, split the file as a multiple of your slices

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s