Category: Finance & Banking
-
Objectives
- Cohesive environment for time-effective Data Science experiments on big data
- Production env capable ML pipelining
Existing Challenges
- MLops challenges. i.e. IAM policies, resource configs for model type, feature stores
- Long experimentation cycle time
- Data Scientists lack independence in data procuration and library setup
- Lack of Production env capable processing
- Disparate data sets
Solutions
- Exploratory work and development of models are done via SageMaker Studio by Data Scientists
- SageMaker Studio SSO and MFA integration and isolated S3 paths satisfy enterprise dev ops compliance
- Databricks for big-data batch processing and S3 for training dataset storage
- Productionized jobs via Sagemaker py API deployable via traditional existing CICD
Benefits
- Time-effective data science work
- Productionized models maintainable by staff DE and Ops teams
-
Objectives
- AWS hosted platform focused on the storage, processing, and presentation of customer and product data
- Spark Vendor Databricks implementation and workflow
Existing Challenges
- DB Storage costs
- Long Pipeline runtimes
- Multiple biz unit data accessibility and separation
- PII, SOX, etc. regulatory compliance
- Public enterprise IT, infosec compliance
Solutions
- Databricks for Spark big data processing, Jupyter Notebooks analysis, Hive tables on S3 for SQL
- S3 for Data Lake source of truth storage and Utility staging and processing storage
- Redshift for Data Warehouse availability to different business unit dashboarding and reports
- ECS Containers for bespoke native codebase applications
- Terraform IaC
- Jenkins code releases and env separation
Benefits
- Analyst and DE accessible
- Scalable
- Enterprise compliant
- Cost manageable’
- Flexible