Issue: Date: 2026-01-30T16:34:30-08:00


Summary

DuckLake is an integrated lakehouse format that layers metadata management on top of standard Parquet files. It’s not a server architecture—instead, it’s a DuckDB extension that runs as part of your local DuckDB process. DuckDB itself handles all query execution and data processing locally.

Main Pieces

1. Metadata Catalog Database

  • Stores all table schemas, column definitions, partitions, snapshots, and file pointers
  • Stored in a DuckDB file (e.g., my_ducklake.ducklake)
  • Can alternatively use PostgreSQL, MySQL, or other SQL databases with ACID transaction support
  • Manages versioning and time-travel capabilities

2. File Storage Layer

  • Actual data stored as Parquet files in a designated directory (e.g., my_ducklake.ducklake.files)
  • Can be local disk or remote object storage (S3, GCS, etc.)
  • Uses immutable file append-only design—updates create new files and deletion records rather than modifying existing files

3. DuckLake Extension

  • A DuckDB extension that acts as the bridge between metadata and storage
  • Installed and loaded directly within DuckDB via ATTACH 'ducklake:metadata.ducklake'
  • Transparently manages all catalog operations and data access through standard SQL

Is There a Server?

No dedicated server component. DuckLake is serverless by design. It runs entirely within the DuckDB process as an extension. Multiple DuckDB instances can attach to the same DuckLake database for “multiplayer” concurrent access—something vanilla DuckDB doesn’t natively support—but there’s no central server coordinating them.

Does Local DuckDB Do All the Work?

Yes. Your local DuckDB instance handles all query execution, parsing, optimization, and data processing. The DuckLake extension simply:

  • Translates SQL operations to metadata updates
  • Manages the catalog database
  • Points DuckDB to the correct Parquet files
  • Provides functions for snapshots and time-travel queries

This architecture means DuckLake scales computation locally while allowing flexible, distributed storage.


Sources