Programmatic Ad Sales delivery and billing reporting codebase
Fix and Refactor legacy codebase for speed, bug fixes and feature enhancements
Existing Challenges
Debugging BI logic on big-data required python pandas dataframe live debugging. i.e. Analysts can’t view and debug reporting issues without a developer
Reworking of business dataset analysis workflow from Py Pandas DFs to Glue/Athena tables
Benefits
Analysts can now work directly with data via SQL in Glue DB Tables. Once fixes are found the logic can be integrated by developer via normal sprint workflow
Goal was to replace Databricks vendor platform and redshift-centric ecosystem to a native AWS EMR,JupyterNotebooks hive-centric and redshift-datamart ecosystem.
Improve data analysis access, and ETL speed, reliability and cost effectivenss
Medallion (bronze, silver, gold) data triage for engagement, ad-sales and content data domains
Existing Challenges
Disparate Data Access: Can’t easily gain access and query data across domains
Database Issues Slow queries. queries lock up or time out. Db load competition. Usage limits
Solutions
AWS EMR w DBT Spark ETL. EMR Jupyter Notebooks for analysis. Git Action w Terraform CICD.
S3 Glue-dbs for lake storage. Redshift datamart reporting storage