Databricks = Data Lakehouse. Underpinned by delta lake. Underpinned by Apache Spark. Commitment to open-source, and commitment to being cross-discipline, with one platform.
The commands in this notebook are awesome, really useful way to better understand the delta lake and the files. In particular:
OPTIMIZE (re-partitions for optimisation)ZORDER (re-orders with funky math, based on the index you use - pick a col with high cardinality)DESCRIBE HISTORY (version changes; can rollback using version numbers)DESCRIBE EXTENDED (schema of table, as well as partition #s)DESCRIBE DETAIL (metadata on table, e.g. filepath, size, num files, partition cols)