Q&A with the Creator of the Pijul Version Control System
This article is a Q&A format with Pierre-Étienne Meunier, the creator and lead developer of the Pijul VCS (version control system).
Q: What is your background in computer science and software development?
A: I've been an academic researcher for about 10 years, and I've recently left academia to found a science company working on issues related to energy savings and decentralised energy production. I'm the only computer scientist, and we work with sociologists and energy engineers.
While I was in academia, my main area of work was asynchronous and geometric computing, whose goal is to understand how systems with lots of simple components can interact in a geometric space to make computation happen. My current work is also somewhat related to this, although in a different way.
My favourite simple components differ from man-made "silicon cores", in that they happen in nature or at least in the real world, and are not made for the specific purpose of computation. For many years, I've worked on getting molecules to self-organise in a test tube to form meaningful shapes. And I've also worked on Pijul, where different authors edit a document in a disorderly way, without necessarily agreeing on "good practices" beforehand.
The idea of Pijul came while Florent Becker and myself were writing a paper on self-assembly. At some point we started thinking about the shortcomings of Darcs (Florent was one of the core contributors of Darcs at the time). We decided that we had to do something about it, to keep the "mathematically designed" family of version-controlled systems alive.
Q: What is your approach to learning, furthering your knowledge and understanding on a topic? What resources do you use?
A: The thing I love the most about computer science is that it can be used to think about a wide variety of subjects of human knowledge, and yet you can get a very concrete and very cheap experience of it by programming a computer. In all disciplines, the main way for virtually anyone to learn technical and scientific things is to play with them.
For example you can read a book about economics, and then immediately write little simulations of games from game theories, or a basic simulator of macroeconomics. You can read the Wikipedia page about wave functions, and then start writing code to simulate a quantum computer. These simulations will not be efficient or realistic, but they will allow their authors to formalise their ideas in a concrete and precise way.
By following this route, you not only get a door into the subject you wanted to understand initially, you also get access to philosophical questions that seemed very abstract before, since computer science is a gateway between the entire world of "pure reason" (logic and mathematics), as Kant would say, and the physical world. What can we know for a fact? Is there a reality beyond language? And suddenly you get a glimpse of what Hume, Kant, Wittgenstein… were after.
Q: What was the first programming language you learned and how did you get into it?
A: I think I started with C when I was around 12. My uncle had left France to take an administrative position high up at the Vatican, and left most of his things with his brothers and sisters. My mother got his computer, a Victor v286, which was already pretty old when we got it (especially in the days where Moore's law was at its peak). Almost nothing was supplied with it: MS-DOS, a text processor, and a C IDE. So if I wanted to use it, I had little choice. I don't remember playing much with the text processor.
Q: Why did you decide to start a Version Control System? What is it about VCS that interests you?
A: My first contact with version control systems was with SVN at university, and I remember being impressed when I first started using it for the toy project we were working on.
Then, when I did my PhD, my friends and colleagues convinced me to switch to Darcs. As the aspirant mathematician I was, the idea of a rigorous patch theory, where detecting conflicts was done entirely by patch commutation, was appealing. As everybody else, I ran every now and then into Darcs' issues with conflicts, until Florent and I noticed that (1) it had become nearly impossible to convince our skeptical colleagues to install and use it and (2) this particular corner of version control systems was surprisingly close to the rest of our research interests, and that's how we got started.
Q: How do you decide what projects to work on?
A: This is one of my biggest problems in life. I'm generally interested in a large number of things, and I have little time to explore all of them in the depth they deserve. When I have to choose, I try to do what I think will teach me (and hopefully others) the most, or what will change things the most.
Q: Why did you choose to write Pijul in Rust?
A: We didn't really choose, Rust was the only language at the time that ticked all the boxes:
Statically typed with automatic memory management, because we were writing mathematical code, and we were just two coders, so we needed as much help from compilers as we could get. That essentially meant one of OCaml, Haskell, Scala, Rust, Idris.
Fast, because we knew we would be benchmarked against Git. That ruled out Scala (because of startup times) and Idris, which was experimental at the time. Also, Haskell can be super fast, but the performance is hard to guarantee deterministically.
Could call C functions on Windows. That ruled out OCaml (this might have been fixed since then), and was a strong argument for Rust, since the absence of a GC makes this particularly easy.
An additional argument for Rust is that compiling stuff is super easy. Sure, there are system dependencies, but most of the time, things "just work". It works so well that most things that people are encouraged to split their work into multiple crates rather than bundling it up into a large monolithic library. As a consequence, even very simple programs can easily have dozens of dependencies.
Q: When you encounter a tough development problem, what is your process to solve it? Especially if the problem is theoretical in nature?
A: It really depends on the problem, but I tend to start all my projects by playing with toy examples, until I understand something new. I take notes about the playing, and then try to harden the reasoning by making the intuitive parts rigorous. This is often where bugs (both in mathematical proofs and in software) are hidden. Often, this needs a new iteration of playing, and sometimes many more. I usually find that coding is a good way to play with a problem, even though it isn't always possible, especially in later iterations of the project.
Of course, for things like Pijul, code is needed all the way to the end, but it is a different sort of code, not the "prototype" kind used to understand a problem.
Apart from that, I find that taking a break to walk outside, without forcing myself to stay too focused on the problem, is very useful. I also do other, more intensive sports, but they rarely help solve my problems.
Q: How big is the Pijul team? Who is it made up of?
A: For quite a while there was just Florent and myself, then a few enthusiasts joined us to work on the previous version: we had about 10 contributors, including Thomas Letan who, in addition to his technical contributions, organised the community, welcomed people, etc. And Tae Sandoval, who wrote a popular tutorial (Pijul for Git users), and has been convincing people on social media at an impressive rate for a few years.
Now that we have a candidate version that seems applicable to real-life projects, the team has started growing again, and the early alpha releases of version 1.0.0, even though they still have a few bugs, are attracting new contributors and testers.
Q: What are your goals for Pijul capabilities, growth, and adoption?
A: When Pijul becomes stable, it will be usable for very large projects, and bring more sanity to source code management, by making things more deterministic.
This has the potential to save a large number of engineering hours globally, and to use continuous integration tools more wisely: indeed, on large repositories, millions of CPU-hours are wasted each year just to check that Git and others didn't shuffle lines around during a merge.
Smaller projects, beginners and non-technical people could also benefit from version control, but they aren't using it now, or at least not enough, because the barrier is just too high. Moreover, in some fields of industry and administration, people are doing version control manually, which seems like a giant waste of human time to me, but I also know that no current tool can do that job.
About growth and adoption, I also want to mention that Pijul started as an open source project, and that we are strongly committed to keeping it open source. Since there is a large amount of tooling to be developed around it to encourage adoption (such as text editor plugins, CI/CD tooling…), we are currently trying to build a commercial side as well, in order to fund these developments. This could be in the form of support, hosting, or maybe specialised applications of Pijul, or all that at the same time.
Q: Do you think Pijul (or any other VCS) could ever overtake Git?
A: I think there could be a space for both. Git is meant as a content-addressable storage engine, and is extraordinarily efficient at that. One thing that is particularly cool with focusing on versions is that diffs can be computed after the fact, and I can see how this is desirable sometimes. For example, for cases where only trivial merges happen (for example adding a file), such as scientific data, this models the reality quite well, as there is no "intent" of an "author" to convey.
In Pijul, we can have different diff algorithms, for instance taking a specific file format into account, but they have to be chosen at record time. The advantages are that the merge is unambiguous, in the sense that there is only one way to merge things, and that way satisfies a number of important properties not satisfied by Git. For example, the fact that merging two changes at once has the same effect as merging the first one, and then the other (as bizarre as it sounds, Git doesn't always do that).
However, because Git does not operate on changes (or "diffs"), it fails to model how collaboration really works. For example, when merging changes made on different branches, Git insists on ordering them, which is not actually what happened: in reality, the authors of the two branches worked in parallel, and merged each other's changes in different orders. This sounds like a minor difference, but in real life, it forces people to reorder their history all the time, letting Git guess how to reshuffle their precious source code in ways that are not rigorous at all.
Concretely, if Alice and Bob each produce a commit in parallel, then when they pull each other's change, they should get the exact same result, and see the same conflicts if there are conflicts. It shouldn't matter whether it is Alice or Bob who solves the conflicts, as long as the resolution works for both of them. If they later decide to push the result to another repository, there is no reason why the conflicts should reappear. They sometimes do in Git (which is the reason for the
git rerere command), and the theory of Pijul guarantees that this is never the case.
Q: If "distributed version control" is the 3rd generation of VCS tools, do you anticipate a 4th generation? If so, what might that look like? What might be the distinguishing feature/aspect of the next generation VCS?
A: I believe "asynchronous" is the keyword here. Git (Mercurial, Fossil, etc.) are distributed in the sense that each instance is a server, but they are really just replicated instances of a central source of authority (usually hosted on GitHub or GitLab and named "master").
In contrast to this, asynchronous systems can work independently from each other, with no central authority. These systems are typically harder to design, since there is a rather large number of cases, and even figuring out how to make a list of cases isn't obvious.
Now of course, project leaders will always want to choose a particular version for release, but this should be dictated by human factors only, not by technical factors.
Q: What is your personal favorite VCS feature?
A: I believe commutativity is the thing every version control system is trying to simulate, with varying degrees of success. Merges and rebases in Git are trying to make things commute (well, when they work), and Darcs does it (except for conflicts). So this is really my favourite feature, and the fact that no one else was doing it rigorously got me into this field.
Q: If you could snap your fingers and have any VCS feature magically appear in Pijul, what would it be?
A: This is an excellent question. One thing we do not capture very well yet is keeping the identity of code blocks across refactorings. If you split or merge a file, or swap two functions in a file, then these changes will not commute with changes that happen in parallel.
We are not alone: Darcs doesn't model that at all, and Git and Mercurial use their merge heuristics to try and solve it. But as I said before, these heuristics are not rigorous, and can sometimes reshuffle files in unexpected ways without telling the user, which has non-trivial security implications.
I have ideas on how to do it in Pijul, but they are not fully formalised yet. I think the format is now extensible enough to support them. What was I saying about playing with toy examples and walking outside? ;-)
In this article, we presented a Q&A session with the creator and lead developer of the Pijul project, Pierre-Étienne Meunier.
If you're interested in learning more about how version control systems work under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how version control systems work at the code level. To do this we documented the first version of Git's code and discuss it in detail.
We hope you enjoyed this post! Feel free to shoot me an email at firstname.lastname@example.org with any questions or comments.