Module history

Source
Expand description

Handle historical checkpoint data.

Full checkpoint data for epochs starting from genesis are persisted in batches as blob files in a remote store.

Files are optionally compressed with the zstd compression format. Filenames follow the format <checkpoint_seq_num>.chk where checkpoint_seq_num is the first checkpoint present in that file. MANIFEST is the index and source of truth for all files present in the ingestion source history.

Ingestion Source History Directory Layout

 - ingestion/
    - historical/
         - MANIFEST
         - 0.chk
         - 1000.chk
         - 3000.chk
         - ...
         - 100000.chk

Blob File Disk Format
┌──────────────────────────────┐
│       magic <4 byte>         │
├──────────────────────────────┤
│  storage format <1 byte>     │
├──────────────────────────────┤
│    file compression <1 byte> │
├──────────────────────────────┤
│ ┌──────────────────────────┐ │
│ │         Blob 1           │ │
│ ├──────────────────────────┤ │
│ │          ...             │ │
│ ├──────────────────────────┤ │
│ │        Blob N            │ │
│ └──────────────────────────┘ │
└──────────────────────────────┘
Blob
┌───────────────┬───────────────────┬──────────────┐
│ len <uvarint> │ encoding <1 byte> │ data <bytes> │
└───────────────┴───────────────────┴──────────────┘

MANIFEST File Disk Format
┌──────────────────────────────┐
│        magic<4 byte>         │
├──────────────────────────────┤
│   serialized manifest        │
├──────────────────────────────┤
│      sha3 <32 bytes>         │
└──────────────────────────────┘

Modules§

manifest
Handle the manifest for historical checkpoint data.
reader

Constants§

CHECKPOINT_FILE_MAGIC
CHECKPOINT_FILE_SUFFIX
MAGIC_BYTES
MANIFEST_FILENAME
MANIFEST_FILE_MAGIC