Pijul - The Mathematically Sound Version Control System Written in Rust
In our Evolution of Version Control System Internals post, we covered the inner workings of many version control systems, both historical and current. However, we haven't really covered the possible future of version control. How will this field evolve going forward?
Although Git is dominant now and will certainly remain a strong player for years to come, will better tools be created? Pijul could be a strong contender.
In this article, we'll discuss Pijul - an alpha stage version control system that is gaining attention in the community.
Pijul is a VCS written by Pierre-Étienne Meunier and Florent Becker. After releasing a number of experimental prototypes between 2015 and 2020, the first alpha version was released in November 2020. By operating on diffs and on versions at the same time, Pijul combines aspects of third-generation VCS such as Git and Darcs.
Pijul is written in Rust and is currently in alpha stage development. The algorithms and formats underlying Pijul's design were recently overhauled for performance and robustness of the system, and the team is working on making it as stable as possible after these updates.
One distinctive feature of Pijul (shared with Darcs) is change commutation, whereby changes that could be recorded independently can be applied in any order, without affecting the result.
However, unlike Darcs, which operates on changes only, Pijul applies changes to an abstract data structure representing generalized files, allowing it to maintain a notion of version as well as a notion of change between versions. This has a number of advantages, in particular in terms of performance and in terms of mathematical soundness.
The source code for Pijul can be found at https://nest.pijul.com/pijul/pijul. The Pijul Nest is a remote hosting platform for Pijul repositories. Think of it as Pijul's version of GitHub or BitBucket.
We already have Git, the most popular and functional VCS on Earth. Git handles all of the features we could expect from a solid VCS (in fact it sets the benchmark for these features), including:
- Intuitive method and interface for version tracking
- Easily sharing work and collaborating remotely
- Lightweight branching and merging
- Fast performance and security features
- A long list of tools for conveniently managing repositories depending on desired workflow and personal style
We also have Darcs, a patch-centric perspective on version control with the following advantages:
- In Darcs, a repository can be better thought of as a "set of patches" - applied as needed - as opposed to a linear history of dependent changesets.
- The Darcs model preserves the identity of patches during operations like rebasing and cherry-picking, whereas Git sometimes needs to rewrite history due to chained identifiers that depend on the order of application. Preserving a change's ID can be considered a more natural approach.
- Darcs has a very well designed interface and command set that provides verbose output to help speed up the learning curve for users and clarify what actions the users are taking.
- Patch bundles can be easily transmitted via email to be applied by the remote repository owner.
So since we have Git and Darcs, why do we need Pijul? Pijul was created to solve unrelated problems that exist in Git and Darcs.
Pijul uses a patch-centric model similar to Darcs, which doesn't require history to be re-written when reordering, cherry-picking, or otherwise reorganizing patches. All patches retain their identities permanently regardless of their context, order, operations performed, or team workflow. This is a very elegant solution and arguably a more natural way to create such a system. This is in contrast to Git in which certain operations such as rebases and cherry-picks can change commit ID's (and other identifiers), even if the content itself doesn't change.
Furthermore, subsequent cherry-picks from a remote branch in Git can lead to unnatural conflicts due to the rewriting of the initial cherry-picked commit's ID. Pijul avoids this problem completely as patches always retain their identity, regardless of their location in a branch.
So what about Darcs? In certain scenarios, Darcs runs into performance issues such as the exponential merge problem. This issue causes certain merges to increase exponentially in difficulty, effectively preventing these merges from being performed. Pijul has solved this problem.
As is summed up nicely in Pierre Meunier's recent post Toward's 1.0: "Our goals are to find the smallest possible system, both for reasons of mathematical aesthetics (why store useless stuff?) and the other one for performance."
Pijul's main purpose is to be an efficient VCS based on a sound mathematical theory, guaranteeing that basic properties of changes are always maintained. This consistency bolsters peace of mind in the software development process. With Pijul, developers can be 100% confident that the code they reviewed is the code that gets merged, which is not necessarily the case in Git and Mercurial. Even though file reshuffles do not seem to happen very often in these existing VCS, (and some of them are caught by tests), there are a few statistical studies highlighting their occurrence, and the security implications are huge.
One particular goal of Pijul is to model conflicts as normal states of collaboration, so that conflicts are resolved by normal changes, valid even for the same conflicts in any other context.
A Pijul repository has a
pristine directory, containing a number of channels. At any given time, a channel contains a set of unordered changes, which can also be seen as a version, since the order of independent changes does not matter in Pijul.
The "working copy" is simply the set of files directly editable by the user, and the correspondence between the working copy and the pristine is done by a file tracking tree, which is just a mapping between working copy files and files as stored in the pristine.
Moreover, changes can (but don't need to) depend on each other, and do so explicitly (see the section about the "sample change" below), in the sense that each change is uniquely identified by its cryptographic hash, and dependencies are explicit hashes of other changes. The minimal dependencies are enforced by Pijul to make sure that text edits make sense. For example, a change editing a file or a paragraph depends on the change that introduced that file or paragraph. This is because it doesn't make sense to change a piece of content that was never added in the first place, so the patch that added the content must be present for the patch that changed it to have meaning. Additionally, the user may specify extra, language-specific dependencies to model the edits more accurately, for example the dependency between introducing a function and using it in another file or another part of the same file. This is an extremely powerful feature.
If desired, this scheme of dependencies between changes allows Pijul to mimic the strict sequential ordering of commits used by Git and Mercurial, turning Pijul into a sort of "Git, but with mathematically sound merges". The downside of using Pijul like this is that changes relative to independent features of the project might need to be more carefully split between different channels, like Git branches. The "plain", or "standard" Pijul way is to try and record changes that are as independent as possible, and keep them on the same channel, since independent changes can always be split later on without changing their identity (i.e. their hash). Channels are useful for different "flavors" of the project, and one can push the same changes to multiple channels without modifying these changes.
Q&A with the Creator
We wanted to get inside the head of Pierre-Étienne Meunier, the creator and lead developer of Pijul, so we asked him a series of questions related to his background and the creation of Pijul, and the direction of the version control field. His answers were extremely interesting and worth a read (we split them into a separate post since they were fairly lengthy).
One of Pijul's goals is to minimize the number of commands, so as to allow users to get a full understanding of the system as quickly as possible.
The commands are:
pijul init: Creates a directory named
.pijul, containing the following structure:
.pijul/ changes config pristine db0 db.lock
Here, the meaningful things that get created are
db0, which contains the pristine, in binary format, and a sample
config file, editable in TOML format.
pijul add <filename.ext>: Adds a file to the repository's tracking list.
pijul remove <filename.ext>: Remove a file from the tracking list.
pijul mv <filename1.ext> <filename2.ext>: Move and/or rename a file in the tracking list.
pijul ls: Displays a list of currently tracked files.
pijul record (or
pijul rec): Creates a change and applies it to the pristine. Once we do that, the
.pijul/changes gets populated with one change:
./ file .pijul/ changes ZN PGE4DJNVY4JAQABNSQYVF5LBFWNO6FRJI3LXX7E7EB4Y3NGGGQC.change config pristine db0 db.lock
pijul unrecord <hash>: If no change depends on a change
hash, we can also "undo" or "unapply" it using the
unrecord command. For example, here our change's hash is
ZNPGE4DJNVY4JAQABNSQ..., and we can use any unambiguous prefix of that hash, for example
pijul unrecord ZNPG, or even
pijul unrecord Z to undo it.
pijul reset: Resets the repository to the state of a channel. Without any argument, the current channel is used, and
pijul reset --channel can be used to change the current channel.
pijul fork: Creates an independent channel with the same changes as the current channel.
pijul channel: Lists all the channels.
pijul channel delete <channel>: Deletes an existing channel.
pijul channel rename <channel1> <channel2>: Renames an existing channel.
pijul push: Sends changes to a remote repository. For example, contributing to Pijul can be done with
pijul push email@example.com:pijul/pijul.
pijul pull: Gets changes from a remote repository.
Sample changefile format
The following content represents the change one gets when adding a single file with three lines to the repository. Once recorded and converted to a binary format for performance reasons (because the change files are read by Pijul quite often), its hash is
message = 'Adding a file' timestamp = '2020-11-20T16:46:51.098461926Z' [[authors]] name = 'pmeunier' full_name = 'Pierre-Étienne Meunier' # Changes 1. File addition: "file" in "/" 644 up 1.0, new 0:6 + First line + Second line + Third line
If we add a line "Another line" between the second and third line, we get the following change:
message = 'Adding another line' timestamp = '2020-11-20T16:47:58.634094619Z' [[authors]] name = 'pmeunier' full_name = 'Pierre-Étienne Meunier' # Dependencies  LXCA3JPGWBSHNWTJLAWLANNSBDQUM4XA3ZLSGKVW6OETSFNYI4QAC # Changes 1. Edit in file:2 2.7 up 2.31, new 0:13, down 2.31 + Another line
Note the presence of the original change as a dependency, identified by the hash
In summary, Pijul is a promising project that is approaching version control in elegant fashion. It strives for many of the objectives that are lacking in current tools like Git and Darcs. It solves complex problems with mathematical rigor and performance in mind. Software development always makes room for new and better solutions to existing problems. Keep an eye on Pijul because it might just lead the push into the next generation of VCS tools.
If you're interested in learning more about how version control systems work under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how version control systems work at the code level. To do this we documented the first version of Git's code and discuss it in detail.
We hope you enjoyed this post! Feel free to shoot me an email at firstname.lastname@example.org with any questions or comments.
Recommended product: Git Guidebook for Developers