Expand description
Append-only B-Tree implementation
The file format is loosely inspired by Couchstore. Nebari is not compatible with Couchstore in any way.
Numbers and Alignment
- All numbers are encoded in big-endian format/network byte order.
- All values are tightly packed. There is no padding or alignment that isn’t explicitly included.
File Organization
There is no way to read this file format starting at byte 0 and iterating forward. The contents of any given byte offset are unknown until the file’s current root header has been found.
When writing data to the file, it will be appended to the end of the file. When a tree is committed, all of the changed nodes will be appended to the end of the file, except for the Root.
Before writing the Root, the file is padded to a multiple of
PAGE_SIZE
. A 3-byte magic code is written, followed by a byte for the
PageHeader
.
The Root is then serialized and written as a chunk.
To locate the most recent header, take the file’s length and find the
largest multiple of PAGE_SIZE
. Check the first three bytes at that
offset for the magic code. If found, attempt to read a chunk. If successful,
attempt to deserialize the Root.
If any step fails, loop back through the file at each PAGE_SIZE
offset
until a valid header is found.
Chunks
Each time a value, B-Tree node, or header is written, it is written as a
chunk. If a Vault
is in-use, each chunk will be
pre-processed by the vault before a CRC-32-BZIP2
checksum is calculated. A
chunk is limited to 4 gigabytes of data (2^32).
The chunk is written as:
u32
- Data length, excluding the header.u32
- CRC[u8]
- Contents
Structs
An active state for a tree file.
A B-Tree entry that stores a list of key-Index
pairs.
A borrowed range in byte form.
Indexes and Reduces VersionedByIdIndex
and UnversionedByIdIndex
.
Contains an EmbeddedIndex
.
The reduced index of both VersionedByIdIndex
and UnversionedByIdIndex
The index stored within VersionedTreeRoot::by_sequence_root
.
The reduced index of BySequenceIndex
.
A wrapper for a CompareSwapFn
.
An interior B-Tree node. Does not contain values directly, and instead points to a node located on-disk elsewhere.
An entry for a key. Stores a single index value for a single key.
One or more keys.
A stored revision of a key.
A tree modification.
A single key’s modification result.
Writes data in pages, allowing for quick scanning through the file.
A stored entry in a versioned tree.
A unique ID of a single modification to a key in a versioned tree file.
A stored index in a versioned tree.
The current state of a tree file. Must be initialized before passing to
TreeFile::new
if the file already exists.
A compaction process that runs in concert with a transaction manager.
An append-only tree file.
A named tree with a specific root type.
The index stored within UnversionedTreeRoot::by_id_root
.
A versioned B-Tree root. This tree root internally uses two btrees, one to keep track of all writes using a unique “sequence” ID, and one that keeps track of all key-value pairs.
The index stored within VersionedTreeRoot::by_id_root
.
A versioned B-Tree root. This tree root internally uses two btrees, one to keep track of all writes using a unique “sequence” ID, and one that keeps track of all key-value pairs.
Enums
A B-Tree entry that stores a list of key-Index
pairs.
An operation to perform on a key.
An operation that is performed on a set of keys.
The header byte for a tree file’s page.
Controls the persistence guarantees of write operations.
A pointer to a location on-disk. May also contain the node already loaded.
The result of evaluating a key or node that was scanned.
Constants
The number of bytes in each page on-disk.
Traits
A named tree that can be used in a transaction.
Borrows a range.
An index that is embeddable within a tree.
Creates an Index
from a key and value.
Reduces one or more Index
es or instances of Self
into a single Self
value.
A B-Tree root implementation.
A type that can be serialized and deserialized.
An index that serializes a value to the file.
Type Definitions
A function that is allowed to check the current value of a key and determine how to operate on it. The first parameter is the key, and the second parameter is the current value, if present.
An unversioned tree with no additional indexed data.
An versioned tree with no additional indexed data.