Detailed course notes

Workspaces and Services

Delta Lake

Relational entities

ETL with Spark SQL

Just Enough Python for Spark SQL

Incremental data processing

Multi-hop architecture

Delta Live Tables

Workflows

Databricks SQL

Permissions

Exam revision, additional notes

Databricks = Data Lakehouse. Underpinned by delta lake. Underpinned by Apache Spark. Commitment to open-source, and commitment to being cross-discipline, with one platform.

The commands in this notebook are awesome, really useful way to better understand the delta lake and the files. In particular:

OPTIMIZE (re-partitions for optimisation)
ZORDER (re-orders with funky math, based on the index you use - pick a col with high cardinality)
DESCRIBE HISTORY (version changes; can rollback using version numbers)
DESCRIBE EXTENDED (schema of table, as well as partition #s)
DESCRIBE DETAIL (metadata on table, e.g. filepath, size, num files, partition cols)