Category: Finance & Banking

  • AWS Data Science Analysis and ML Pipeline Platform: Databricks Spark, Sagemaker

    AWS Data Science Analysis and ML Pipeline Platform: Databricks Spark, Sagemaker

    Objectives

    • Cohesive environment for time-effective Data Science experiments on big data
    • Production env capable ML pipelining
    Existing Challenges

    • MLops challenges. i.e. IAM policies, resource configs for model type, feature stores
    • Long experimentation cycle time
    • Data Scientists lack independence in data procuration and library setup
    • Lack of Production env capable processing
    • Disparate data sets
    Solutions

    • Exploratory work and development of models are done via SageMaker Studio by Data Scientists
    • SageMaker Studio SSO and MFA integration and isolated S3 paths satisfy enterprise dev ops compliance
    • Databricks for big-data batch processing and S3 for training dataset storage
    • Productionized jobs via Sagemaker py API deployable via traditional existing CICD
    Benefits

    • Time-effective data science work
    • Productionized models maintainable by staff DE and Ops teams

  • Big Data and Financials Analysis Platform

    Big Data and Financials Analysis Platform

    Objectives

    • AWS hosted platform focused on the storage, processing, and presentation of customer and product data
    • Spark Vendor Databricks implementation and workflow
    Existing Challenges

    • DB Storage costs
    • Long Pipeline runtimes
    • Multiple biz unit data accessibility and separation
    • PII, SOX, etc. regulatory compliance
    • Public enterprise IT, infosec compliance
    Solutions

    • Databricks for Spark big data processing, Jupyter Notebooks analysis, Hive tables on S3 for SQL
    • S3 for Data Lake source of truth storage and Utility staging and processing storage
    • Redshift for Data Warehouse availability to different business unit dashboarding and reports
    • ECS Containers for bespoke native codebase applications
    • Terraform IaC
    • Jenkins code releases and env separation
    Benefits

    • Analyst and DE accessible
    • Scalable
    • Enterprise compliant
    • Cost manageable’
    • Flexible