Format of a Pijul repository
When running tree
in a Pijul repository with one patch,
we get the following result:
.pijul
├── changes.wUAdAoNq
├── hooks
├── id
├── local
│ └── ignore
├── meta.toml
├── patches
│ ├── AKm4JUd3Z5LRLpg1jezgmhLWGyzR27Qv9ATDFSb7QquqvtMwqEoNiiuVP1u2ho7V8qbprRCRPsmGY6bumYjoH8JZ.gz
│ └── AKm4JUd3Z5LRLpg1jezgmhLWGyzR27Qv9ATDFSb7QquqvtMwqEoNiiuVP1u2ho7V8qbprRCRPsmGY6bumYjoH8JZ.sig
├── pristine
│ ├── db
│ └── db.lock
└── version
4 directories, 9 files
There are a number of files:
Pristine
This directory contains two files, one called db
and
another one called db.lock
. File db
is a Sanakirja database, which
is essentially a collection of B trees stored in a file. Sanakirja
databases have the following properties:
They are transactional: committing a transaction is an atomic operation, and transactions can be cancelled entirely.
They achieve transactionality using the copy-on-write strategy, which enables readers to read in parallel to one writer. Before a transaction is committed, readers only see the pre-commit version of the database. Moreover, multiple writers still exclude each other.
B trees of size n are clonable in time and space O(log n). In practice, the complexity of cloning is often much lower (i.e. independent from the database size), depending on the history of the database.
The .lock
file is used to exclude multiple processes to
access db
concurrently, since there is no inter-process
transaction synchronisation in Sanakirja at the moment.
Pijul stores a number of tables, defined in module
libpijul::backend
:
pub struct Dbs {
/// A map of the files in the working copy.
tree: sanakirja::Db<self::file_id::UnsafeFileId, self::inode::Inode>,
/// The reverse of tree.
revtree: sanakirja::Db<self::inode::Inode, self::file_id::UnsafeFileId>,
/// A map from inodes (in tree) to keys in branches.
inodes: sanakirja::Db<self::inode::Inode, self::file_header::FileHeader>,
/// The reverse of inodes, minus the header.
revinodes: sanakirja::Db<self::key::Key<PatchId>, self::inode::Inode>,
/// Text contents of keys.
contents: sanakirja::Db<self::key::Key<PatchId>, sanakirja::value::UnsafeValue>,
/// A map from external patch hashes to internal ids.
internal: sanakirja::Db<self::hash::UnsafeHash, self::patch_id::PatchId>,
/// The reverse of internal.
external: sanakirja::Db<self::patch_id::PatchId, self::hash::UnsafeHash>,
/// A reverse map of patch dependencies, i.e. (k,v) is in this map
/// means that v depends on k.
revdep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
/// A map from branch names to graphs.
branches:
sanakirja::Db<self::small_string::UnsafeSmallStr, (NodesDb, PatchSet, RevPatchSet, u64)>,
/// A map of edges to patches that remove them.
cemetery: sanakirja::Db<(self::key::Key<PatchId>, self::edge::Edge), self::patch_id::PatchId>,
/// Dependencies
dep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
/// Files touched by patches.
touched_files: sanakirja::Db<self::key::Key<PatchId>, self::patch_id::PatchId>,
/// Partial checkouts: branch -> partial
partials: sanakirja::Db<self::small_string::UnsafeSmallStr, self::key::Key<PatchId>>,
}
Let's look at each of these tables:
/// A map of the files in the working copy.
tree: sanakirja::Db<self::file_id::UnsafeFileId, self::inode::Inode>,
/// The reverse of tree.
revtree: sanakirja::Db<self::inode::Inode, self::file_id::UnsafeFileId>,
The "tree" and "revtree" tables are used to represent the working directory tree, and assign "inodes", which are just arbitrary integers, to files and directories.
/// A map from inodes (in tree) to keys in branches.
inodes: sanakirja::Db<self::inode::Inode, self::file_header::FileHeader>,
/// The reverse of inodes, minus the header.
revinodes: sanakirja::Db<self::key::Key<PatchId>, self::inode::Inode>,
/// Text contents of keys.
The inodes
and revinodes
table map inodes
(i.e. files in the working directory) to vertices in the graph, plus
states of the file. The values in the inodes
table have
type FileHeader
, which contains an inode, plus the
pre-record "state" of the file, i.e. what the user did with the file
recently.
contents: sanakirja::Db<self::key::Key<PatchId>, sanakirja::value::UnsafeValue>,
The contents
table just maps nodes in the graph to their
binary contents.
/// A map from external patch hashes to internal ids.
internal: sanakirja::Db<self::hash::UnsafeHash, self::patch_id::PatchId>,
/// The reverse of internal.
external: sanakirja::Db<self::patch_id::PatchId, self::hash::UnsafeHash>,
The internal
and external
tables map
internal patch ids (64-bit integers) to external patches (SHA512
hashes), in both directions.
/// A reverse map of patch dependencies, i.e. (k,v) is in this map
/// means that v depends on k.
revdep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
The revdep
table contains a reverse mapping of the
dependencies. Dependencies of a patch are stored in the patch file
itself, but the unrecord
command needs to know whether a
patch is still depended upon before accepting to remove it. This table
provides that information.
/// A map from branch names to graphs.
branches:
sanakirja::Db<self::small_string::UnsafeSmallStr, (NodesDb, PatchSet, RevPatchSet, u64)>,
The branches
table is a map from branch names to a
triple of:
- A
NodesDB
, representing the graph of the branch. - A
PatchSet
, mapping internal patch identifiers to their sequence number (a 64-bit integer) on the branch. - A
RevPatchSet
, mapping sequence numbers to internal patch identifiers. In practice, this table is implementing a randomly-addressable stack.
/// A map of edges to patches that remove them.
cemetery: sanakirja::Db<(self::key::Key<PatchId>, self::edge::Edge), self::patch_id::PatchId>,
The cemetery
table is used in unrecord
, and
is used to indicate what to do in the situation where two patches change
the label of the same edge, and we unrecord one of them. What should be
the label of the resulting edge? The cemetery
table stores
which patch deletes which edge.
This table duplicates information contained in the patches, but makes things more self-contained and faster to read when needed.
/// Dependencies
dep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
The dep
table maps internal patch identifiers to their
dependencies. This piece of information is also contained in the
patches, but having it here makes the database more self-contained and
faster to access.
/// Files touched by patches.
touched_files: sanakirja::Db<self::key::Key<PatchId>, self::patch_id::PatchId>,
The touched_files
table maps vertices in the graph
(vertices representing files) to patch identifiers. It is used to tell
which patches to pull if we only want the subset of patches that touched
a certain file or directory in the repository.
/// Partial checkouts: branch -> partial
partials: sanakirja::Db<self::small_string::UnsafeSmallStr, self::key::Key<PatchId>>,
The partials
table maps branch names to the subset of
the repository tracked by this branch. When we do a partial clone, we
fully apply all the patches related to the part of the repository we
want to get. However, these patches might also touch other parts that we
don't want to output. This table tells what parts of the repository we
want to work on in that case.