AWS Big Data Media Platform

Objectives

  • Goal was to replace Databricks vendor platform and redshift-centric ecosystem to a native AWS EMR,JupyterNotebooks hive-centric and redshift-datamart ecosystem.
  • Improve data analysis access, and ETL speed, reliability and cost effectivenss
  • Medallion (bronze, silver, gold) data triage for engagement, ad-sales and content data domains
Existing Challenges

  • Disparate Data Access: Can’t easily gain access and query data across domains
  • Database Issues Slow queries. queries lock up or time out. Db load competition. Usage limits
Solutions

  • AWS EMR w DBT Spark ETL. EMR Jupyter Notebooks for analysis. Git Action w Terraform CICD.
  • S3 Glue-dbs for lake storage. Redshift datamart reporting storage
Benefits

  • Long-term Flexibility, Reliability, Robustness
  • Cost Manageable and Flexible