Image of The Distributed Nature of Git and Bitcoin

ADVERTISEMENT

Table of Contents

Introduction

This article is Part 1 in our series on the similarities between Git and Bitcoin. We will discuss distributed systems and how Git and Bitcoin leverage them in their architecture.

It may not seem like it at first glance, but Git, the version control system that (almost) everyone loves, and Bitcoin, the first widely-popular cryptocurrency, actually have a number of striking similarities internally.

Namely, both are distributed, both use a content-addressable data structure for recording data, and both are based on a Merkle tree to store that data.

Distributed architecture

Broadly speaking, one can file things like cryptocurrencies and version control systems into one of two main overarching camps: centralized systems and distributed systems (also called decentralized).

Centralized systems, which, in VCS terms are ones like CVS and Subversion, require a single central server. This is the one authority - what it says is the law. Nobody else can make changes, only ask the central server to make them instead. For the most part, there is no disputing whose version of the information is the right copy. To make sure you're up to date, just download a fresh copy from the server, throw away the rest, and that is that.

This approach also has some downsides. With a single point of contact, there's also a single point of failure. Anyone who's familiar with CVS or Subversion knows that if you lose connection to that server, you lose (almost) your entire ability to work.

In cryptocurrency terms, this is like if your bank just started responding to everything with a "hey, we don't exist". Want to check your balance on $BankName Mobile? Nope. Want to buy a fourth mocha because that project is due in three days and I've been up for five? Or a new pack of stickers to plaster on your laptop? Sorry, the bank doesn't exist anymore.

And, more pressingly for currencies, this means no real tamper resistance. It's not that hard to fiddle numbers if you only need to do it once to the one thing that everyone by nature has to trust ultimately. Imagine if some greedy hacker, ne'er do well, or script kiddie wanting to cause trouble hacked into your preferred currency's central server and gave themselves all the riches in the world? Or set everyone's balance to zero? Or if someone with a grudge goes in and wipes out some of your transactions. Now, there's no proof you actually paid for something, hope the shop owner doesn't mind.

Distributed systems are ones with, well, no central authority. Take Git, Monotone, or Mercurial as examples. Disregarding code hosting websites that kinda blur the lines, the cores of these systems are built on the concept of a central server. For example, in Git it's completely possible to collaborate on a repository by linking your computer with a co-worker's and pushing and pulling to each other directly.

A major pro of distributed systems is that with no central authority, everyone has their own complete copy of the dataset. When you git clone a repo, you download everything there is, not just the things you need. With every participant having a copy, forging things in the manner described above is much harder to do, since now you'd need to somehow change the same thing for everyone.

However, this does present a problem of conflicting changes. If two copies of the repository disagree, who is right? But that is something that we won't dwell on for the moment.

Do note here, decentralized systems do not require a central server, but they do not disallow one. Again, Git. For average software projects, you might push changes to a company's internal Git server, a random guy on the internet who runs one out of his house for free, or something like GitHub. These are effectively central servers in the eyes of the people, but not the underlying system.

Yes, everything may pass through the central server, and yes, if that goes down, the people might have a small issue, but the system itself will still function. You can still work in Git without an active internet connection, you just can't share your work. Many decentralized systems have at least one "central" server that isn't an authoritative source, but more of a well-known landmark that new joins can use to catch up on everything that they don't have yet.

Git's distributed architecture

Okay, I've given examples, so now let's give some details. For those unaware, a Git repository is a database of many, many objects. For everyone with a copy - (a git clone) - they have the entire database.

Work sharing is done by updating someone else's database with the changes that you've made, and nothing more. Everyone's complete copy means everything can be done offline, and also means that if you lose your copy, someone nearby can just give you their's, and you're back up and running again.

Bitcoin's distributed architecture

Bitcoin, at its core, is a distributed system.

  • Everyone has a complete copy of the ledger of all transactions (the blockchain).
  • All transactions are peer-to-peer, you don't need to gain authorization from a central authority.
  • Anybody is allowed to inspect (read), or contribute to, the network.
  • The folks that verify and maintain the network - the Bitcoin miners - are free to join and leave at will, and not appointed by any central source.

But... let's look into that in a little more detail.

Digital currencies and Bitcoin

With normal fiat currencies, the bank decides everyone's balances. If I open the bank's app on my phone, I am told how much money I have available to spend at any given moment, not through validation or peer confirmation but because the bank, after looking through their records, has said that's how much I have. (And, side note, before you point out there being multiple banks, every real physical currency has at least one "bank" that's the central authority on that currency. It has to end somewhere).

With Bitcoin, this is not true. A user has Bitcoin because the public ledger of transactions says that, after all the profits and losses, the end result of all that addition and subtraction canceling out is... their balance. If I want to change this, I have to add the change to the network, and it needs to be accepted by the majority of the network. I don't ask the bank to do the transfer (with a credit/debit card), I just make create the transaction and send it to the network, soon it will be verified and seen as legitimate.

Transaction validation

The folks that verify and validate Bitcoin transactions are known as Bitcoin miners. They are an open (and expanding) group that anyone is free to join. People may join or leave as they wish, and (in theory) anyone in the network may participate in verifying the state of the network.

Network consensus

Making sure that everyone sees the same balance for each user is a big deal. For currencies, if two clients disagree, that's an issue.

But because everything is public, in theory, the worst that will happen is you have a slightly out-of-date number that will fix itself shortly. Without, a central authority, you don't need to trust any single entity to ensure valid balances. This is all done in a decentralized fashion by the network of peers.

In this sense, "right" and full consensus are just "what the majority agrees with."

Conclusion

In sum, both Git and Bitcoin leverage distributed architectures with no required central authority. Git's distributed nature is a means to allow efficient team collaboration on digital projects. Bitcoin uses decentralization to remove the need to trust a single party when it comes to currency transactions and decisions.

In Part 2 of this series, we'll learn about Merkle trees and how Git and Bitcoin both make use of them.

For a comprehensive overview of Bitcoin, I highly recommend the book Mastering Bitcoin, by O'Reilly Media.

If you're interested in learning how Bitcoin's code works, check out our Baby Bitcoin Guidebook for Developers.

Final Notes