Yestin L. Harrison

Why Pijul?

Pijul is the first distributed version control system to be based on a sound mathematical theory of changes. It is inspired by Darcs, but aims at solving the soundness and performance issues of Darcs.

Pijul has a number of features that allow it to scale to very large repositories and fast-paced workflows. In particular, change commutation means that changes written independently can be applied in any order, without changing the result. This property simplifies workflows, allowing Pijul to clone sub-parts of repositories, to solve conflicts reliably, to easily combine different versions.

Change commutation

In Pijul, for any two changes A and B, either A and B can be applied in any order, or A depends on B, or B depends on A.

Associativity

In Pijul, change application is an associative operation, meaning that applying some change A, and then a set of changes (BC) at once, yields the same result as applying (AB) first, and then C.

With branches, the first scenario looks like this: Bob creates A, while Alice creates B, C, and Bob finally merges both B and C at once.

The second scenario would look like the following, with Bob creating commit A, and then pulling B. At that moment, Bob has both A and B on his branch, and wants to pull C from Alice.

Note that this is different from change reordering: here, we apply A, then B, then C, in the same order in both scenarios.

Using math words such as "associative" for such a simple operation may sound like nitpicking, because intuition suggests that it should always be the case. However, Git doesn't guarantee that property, even if A, B, and C do not conflict. Concretely, this means that Git (and relatives) can sometimes shuffle lines around, because these systems only track versions, rather than the changes that happen between the versions. And even though one can reconstruct one from the other, the following example (taken from here) shows that tracking versions only does not yield the expected result.

Git merge (which A is which?)
Pijul merge

In this diagram, Alice and Bob start from a common file with the lines A and B. Alice adds G above everything, and then another instance of A and B above that (her new lines are shown in green). Meanwhile, Bob adds a line X between the original A and B.

This example will be merged by Git, SVN, Mercurial… into the file shown on the left, with the relative positions of G and X swapped, where as Pijul (and Darcs) yield the file on the right, preserving the order between the lines. Note that this example has nothing to do with a conflict, since the edits happen in different parts of the file. And in fact neither Git nor Pijul will report a conflict in this case.

The reason for the counter-intuitive behaviour in Git is that Git runs a heuristic algorithm called three-way merge or diff3, which extends diff to two "new" versions instead of one. Note, however, that diff has multiple optimal solutions, and the same change can be described equivalently by different diffs. While this is fine for diff (since the patch resulting from diff has a unique interpretation), it is ambiguous in the case of diff3 and might lead to arbitrary reshuffling of files.

Obviously, this does not mean that the merge will have the intended semantics: code should be still reviewed and tests should still be run. But at least a review of the change will not be made useless by a reshuffling of lines by the version control tool.

Modelling conflicts

Conflicts are a normal thing in the internal representation of a Pijul repository. Actually, after applying new changes, we even have to do extra work to find where the conflicts are.

In particular, changes editing sides of a conflict can be applied without resolving the conflict. This guarantees that no information ever gets lost.

This is different from both Git and Darcs:

Comparisons with other version control systems

Pijul for Git/Mercurial/SVN/… users

The main difference between Pijul and Git (and related systems) is that Pijul deals with changes (or patches), whereas Git deals only with snapshots (or versions).

There are many advantages to using changes. First, changes are the intuitive atomic unit of work. Moreover, changes can be merged according to formal axioms that guarantee correctness in 100% of cases, whereas commits have to be /stitched together based on their contents, rather than on the edits that took place/. This is why in these systems, conflicts are often painful, as there is no real way to solve a conflict once and for all (for example, Git has the rerere command to try and simulate that in some cases).

Pijul for Darcs users

Pijul is mostly a formally correct version of Darcs' theory of changes, as well as a new algorithm for merging changes. Its main innovation compared to Darcs is to use a better data structure for its pristine, allowing for:

However, Pijul's pristine format was designed to comply with axioms on a specific set of operations only. As a result, some of darcs' features, such as darcs replace, are not (yet) available.