Image of Baby Git Staging Index


Baby Git Staging Index

As a part of Git's initial commit, Linus Torvalds created the staging index. The staging index is also referred to as the current directory cache, or in more modern terminology, Git's staging area. As Git users know, the staging area is where versions of changed files are added as a precursor to a commit. In modern-day Git, files are added to the staging index via the git add <file> command and removed using the git checkout -- <file> command. Back in the old days of Baby Git, files were staged using the update-cache <file> command.

The staging index is a simply a binary file named index. It is located in the hidden .dircache folder at the root of the project, as follows:


This index file is nothing more than a cache. When a Baby Git repository is initialized using the init-db command, no index file exists yet since nothing has been added to the cache. It is effectively an empty cache. As the user creates and updates files in the working directory, and then tracks those files in Git using the update-cache <file> command, the index file is created and updated.

So what exactly does the staging index contain? To answer that question, we need to understand how Git tracks content. Let's assume we start with an empty project root folder called project. Inside this folder, we run the init-db command to initialize the Git repository. This creates the object database folder structure. At this point we have a brand new Git repository with no tracked content.

Let's create a file in our working directory called test.txt containing the word test and tell Git to track it using the command update-cache test.txt. This does four things:

  1. Compresses the contents of test.txt and then hashes it
  2. Stores the content as a blob in the object database indexed by its hash value
  3. Creates a tree object that references the blob, including its name, type, and permissions
  4. Adds this tree to the staging index. We can now see that the index file was created as .dircache/index

Behind the scenes, Git's code added a so called cache entry to an array called active_cache. If we add another file to our working directory called test2.txt, we can add that to the index in the same way by running update-cache test2.txt. This runs through the same four steps as above, adding the second file's blob to the object database, its tree to the index, and a new cache entry in the active_cache.

Once the staged changes are ready, the write-tree command can be used to store a new tree in the object database corresponding to all files staged in the index. This tree represents the state of all files staged in the index.

Finally, a commit object can be created based on this new tree by using the commit-tree <sha1> command. This command takes the SHA-1 hash of the index's tree as an argument, and creates a new commit object including the author, email, date, and a user specified commit message.

In this article, we discussed what Baby Git's staging index is and how it is used in the process of making commits in the repository.

Next Steps

If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.