ARC340-R1 – Amazon.com automating machine learning deployments at scale (re:Invent 2019) – Key Takeaways

NOTE: I personally consider this a 400-level presentation, very informative and elaborate. Also good for 200-level beginners though.

The Key

  • Model Development Life Cycle (MDLC) compare and contrasts to SDLC

The Takeaways

  • Indeterministic nature of ML makes deploying and maintaining a production system difficult
  • Comparing SDLC status quo with Model Development Life Cycle (MDLC)
  • MDLC is highly manual or human-driven
  • Automating the MDLC process
    • Stage 1: Data sourcing
      • Difficult to find feature: different column names, duplicates, etc.
        • Use metadata store, ETL
      • Generating data from production database is bad
        • Use streaming instead
    • Stage 2: Data quality monitoring
      • Unstructured data is not like API, no contracts, overloading everywhere
        • A dedicated system to subscribe to the stream and check data quality
    • Stage 3: Feature engineering
      • Use in-place calculation
      • Aggregations within past period can be expensive
        • Incremental calculation engine
        • Intermediate data store
    • Stage 4: Model development
      • Built-in algorithms
      • Marketplace models
      • Trainers are created as containers and SageMaker can pull images from ECR, run and manage them
    • Stage 5: Model training & evaluation
      • Train model with assets and tune hyperparameters > get model artifacts > create model with artifacts > Test model with holdout data > Measure performance
        • *Holdout data = data similar to training assets but is new
      • Challenge: Track all parameter changes in data sets, parameters, models, can be cumbersome and difficult, SageMaker Experiments are for this
    • Stage 6: Model deployment & inference
      • 2 deployment types
        • Offline, with batch inputs and outputs, like recommendation system
        • Real-time, use endpoints, like fraud detection
      • Challenge: Situations may be complex, even with offline deployment, if customer only needs a subset of data, you may use endpoint instead
    • Stage 7: Model monitoring
      • 2 target types
        • Close-loop: Accuracy, ensure users always get accurate result
        • Open-loop: Consistency, ensure users always get same result
    • Stage 8: Client integration
  • MDLC workflows
    • Use Batch, StepFunctions, Lambda to create an automated workflow
    • 2 examples of workflows
      • Offline (batch)
      • Online

Automated SDLC
Human-driven MDLC