Module nebari::tree

source · []
Expand description

Append-only B-Tree implementation

The file format is loosely inspired by Couchstore. Nebari is not compatible with Couchstore in any way.

Numbers and Alignment

  • All numbers are encoded in big-endian format/network byte order.
  • All values are tightly packed. There is no padding or alignment that isn’t explicitly included.

File Organization

There is no way to read this file format starting at byte 0 and iterating forward. The contents of any given byte offset are unknown until the file’s current root header has been found.

When writing data to the file, it will be appended to the end of the file. When a tree is committed, all of the changed nodes will be appended to the end of the file, except for the Root.

Before writing the Root, the file is padded to a multiple of PAGE_SIZE. A 3-byte magic code is written, followed by a byte for the PageHeader.

The Root is then serialized and written as a chunk.

To locate the most recent header, take the file’s length and find the largest multiple of PAGE_SIZE. Check the first three bytes at that offset for the magic code. If found, attempt to read a chunk. If successful, attempt to deserialize the Root.

If any step fails, loop back through the file at each PAGE_SIZE offset until a valid header is found.

Chunks

Each time a value, B-Tree node, or header is written, it is written as a chunk. If a Vault is in-use, each chunk will be pre-processed by the vault before a CRC-32-BZIP2 checksum is calculated. A chunk is limited to 4 gigabytes of data (2^32).

The chunk is written as:

  • u32 - Data length, excluding the header.
  • u32 - CRC
  • [u8] - Contents

Structs

An active state for a tree file.

A B-Tree entry that stores a list of key-Index pairs.

A borrowed range in byte form.

The reduced index of both VersionedByIdIndex and UnversionedByIdIndex

The reduced index of BySequenceIndex.

A wrapper for a CompareSwapFn.

An interior B-Tree node. Does not contain values directly, and instead points to a node located on-disk elsewhere.

An entry for a key. Stores a single index value for a single key.

One or more keys.

A stored revision of a key.

A tree modification.

A single key’s modification result.

Writes data in pages, allowing for quick scanning through the file.

A stored entry in a versioned tree.

A unique ID of a single modification to a key in a versioned tree file.

A stored index in a versioned tree.

The current state of a tree file. Must be initialized before passing to TreeFile::new if the file already exists.

A compaction process that runs in concert with a transaction manager.

An append-only tree file.

A named tree with a specific root type.

A range of u64 values that is able to be used as keys in a tree scan, once borrowed.

A versioned B-Tree root. This tree root internally uses two btrees, one to keep track of all writes using a unique “sequence” ID, and one that keeps track of all key-value pairs.

A versioned B-Tree root. This tree root internally uses two btrees, one to keep track of all writes using a unique “sequence” ID, and one that keeps track of all key-value pairs.

Enums

A B-Tree entry that stores a list of key-Index pairs.

An operation to perform on a key.

An operation that is performed on a set of keys.

The header byte for a tree file’s page.

Controls the persistence guarantees of write operations.

A pointer to a location on-disk. May also contain the node already loaded.

The result of evaluating a key or node that was scanned.

Constants

The number of bytes in each page on-disk.

Traits

A named tree that can be used in a transaction.

Borrows a range.

An index that is embeddable within a tree.

Creates an Index from a key and value.

Reduces one or more Indexes or instances of Self into a single Self value.

A B-Tree root implementation.

A type that can be serialized and deserialized.

An index that serializes a value to the file.

Type Definitions

A function that is allowed to check the current value of a key and determine how to operate on it. The first parameter is the key, and the second parameter is the current value, if present.

An unversioned tree with no additional indexed data.

An versioned tree with no additional indexed data.