Learn Git - Baby Git Staging Index
Table of Contents
Baby Git Staging Index
As a part of Git's initial commit, Linus Torvalds created the staging index. The staging index is also referred to as the current directory cache, or in more modern terminology, Git's staging area. As Git users know, the staging area is where versions of changed files are added as a precursor to a commit. In modern-day Git, files are added to the staging index via the
git add <file> command and removed using the
git checkout -- <file> command. Back in the old days of Baby Git, files were staged using the
update-cache <file> command.
The staging index is a simply a binary file named
index. It is located in the hidden
.dircache folder at the root of the project, as follows:
This index file is nothing more than a cache. When a Baby Git repository is initialized using the
init-db command, no index file exists yet since nothing has been added to the cache. It is effectively an empty cache. As the user creates and updates files in the working directory, and then tracks those files in Git using the
update-cache <file> command, the index file is created and updated.
So what exactly does the staging index contain? To answer that question, we need to understand how Git tracks content. Let's assume we start with an empty project root folder called
project. Inside this folder, we run the
init-db command to initialize the Git repository. This creates the object database folder structure. At this point we have a brand new Git repository with no tracked content.
Let's create a file in our working directory called
test.txt containing the word
test and tell Git to track it using the command
update-cache test.txt. This does four things:
- Compresses the contents of
test.txtand then hashes it
- Stores the content as a blob in the object database indexed by its hash value
- Creates a tree object that references the blob, including its name, type, and permissions
- Adds this tree to the staging index. We can now see that the index file was created as
Behind the scenes, Git's code added a so called cache entry to an array called
active_cache. If we add another file to our working directory called
test2.txt, we can add that to the index in the same way by running
update-cache test2.txt. This runs through the same four steps as above, adding the second file's blob to the object database, its tree to the index, and a new cache entry in the
Once the staged changes are ready, the
write-tree command can be used to store a new tree in the object database corresponding to all files staged in the index. This tree represents the state of all files staged in the index.
Finally, a commit object can be created based on this new tree by using the
commit-tree <sha1> command. This command takes the SHA-1 hash of the index's tree as an argument, and creates a new commit object including the author, email, date, and a user specified commit message.
In this article, we discussed what Baby Git's staging index is and how it is used in the process of making commits in the repository.
If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.
Recommended product: Decoding Git Guidebook for Developers