Advanced Databricks Course Outline
Length: 3 Days
Module 1: Deep Spark Internals
Catalyst optimizer: logical plans, physical plans, and query execution
Tungsten execution engine and whole-stage code generation
Memory management: on-heap vs. off-heap, spill behavior
Custom partitioners and repartitioning strategies
Debugging complex DAGs and understanding shuffle internals
Module 2: Advanced Streaming Architectures
Structured Streaming internals: micro-batch vs. continuous processing
Watermarking, late data handling, and stateful aggregations
Stream-stream joins and windowing strategies
Kafka integration: exactly-once semantics and offset management
Lab: Build a fault-tolerant streaming pipeline with complex stateful logic
Module 3: Advanced Delta Lake Patterns
Delta Lake internals: Parquet footers, transaction protocol, and compaction
Multi-cluster writes and concurrency control
Liquid clustering and deletion vectors
Building custom Delta connectors and integrations
Performance tuning: file sizing, Z-order column selection, and bloom filters
Module 4: Lakehouse Platform Architecture
Multi-workspace deployment patterns and cross-workspace access
Unity Catalog advanced topics: external locations, storage credentials, and federation
Data mesh implementation with Databricks
Hybrid and multi-cloud Lakehouse strategies
Disaster recovery and high availability patterns
Module 5: MLOps & Model Serving
MLflow advanced features: model registry, webhooks, and custom flavors
Feature Store design patterns and point-in-time lookups
Model serving: real-time endpoints, A/B testing, and canary deployments
Monitoring model drift and automated retraining pipelines
Lab: Deploy an end-to-end ML pipeline with automated retraining
Module 6: Cost Optimization & Resource Management
Cluster pools and instance fleet strategies
Photon engine: when to use and performance considerations
Serverless compute: SQL warehouses and serverless jobs
Cost attribution with tags and chargeback models
Query profiling and warehouse sizing for optimal price/performance
Module 7: Enterprise Security & Compliance
Advanced Unity Catalog governance: attribute-based access control and data lineage
Dynamic views for fine-grained access control
Audit logging and compliance reporting
Private Link, VPC peering, and network isolation patterns
Customer-managed keys and data encryption strategies
Module 8: Advanced DevOps & Automation
Databricks Asset Bundles for infrastructure-as-code
Terraform provider deep dive and state management
REST API automation and custom integrations
Advanced testing patterns: integration tests, data quality frameworks
Lab: Build a complete CI/CD pipeline with Asset Bundles and automated testing
Prerequisites
Completion of Intermediate Databricks course (or equivalent experience)
Strong Spark/PySpark proficiency
Experience with Delta Lake and Unity Catalog
Familiarity with Databricks Workflows and production deployments