Image of What is a commit in Git?

ADVERTISEMENT

Table of Contents

Introduction

Commits are one of Git's most important concepts, yet they are often misunderstood, or not fully understood.

In this article, we'll explain everything you need to know about Git commits.

This is Part 3 of our series on Git's main object types, which started with Part 1: What is a blob in Git? and continued with Part 2: What is a tree in Git?.

What is a Git commit?

A Git commit is a snapshot of your project's working directory at a specific point in time, taken by a specific author.

Git keeps a record of all commits made in your project and uses this to build up a full history of every change ever made to each file.

This allows Git users to check out any previous version of the project using the command git checkout <commit-id>.

It also allows developers to keep track of exactly who made every change, down to each and every line of code. For example, the git blame command can be used to identify the most recent person who touched each line of a file.

What are commit parents?

Git commits use the notion of parents to link commits together in chains.

The initial commit in any Git repository is called the root commit and has no parents. This makes sense since it's the very first one - there are no other existing commits that could be its parent.

However, all other commits after the initial commit have at least 1 parent. But what does it really mean for a commit to have a parent?

When you create a new commit with the git commit command, the new commit will be added to the tip of your current branch. A reference to the previous commit on the branch is stored within the new commit. In this way, each commit is linked back to the previous one - known as its parent - on the branch.

How can a commit have multiple parents?

When you use the git merge command to combine changes from multiple branches, Git will create a merge commit.

A merge commit can actually have 2 or more parents. This is because a merge commit combines changes from multiple previous commits which represent the state of the respective branches being merged.

A merge commit will store references to all of the parent commits being merged, which is usually 2. When Git merges changes from more than 2 branches, it's called an octopus merge

.

Where are Git commits stored?

Like all other objects such as Git blobs and trees, commit objects are stored in Git's repository, also known as the object database (or object store). This resides in your project root at the path .git/objects/.

What is the Git commit format?

Git commits are built up by Git's code in a memory buffer in the following format:

commit <size-of-commit-data-in-bytes>'\0'
<tree-SHA1-hash>
<parent-1-commit-id>
<parent-2-commit-id>
...
<parent-N-commit-id>
author ID email date
committer ID email date

user comment

This format starts with the object's type, which is "commit", followed by the size of the commit object in bytes.

Second is the SHA1 hash value of the root tree being committed. Just so you know, the root tree is built from the changed files that you added to the staging area, and it represents the set of files and folders being included in the commit.

Next comes a list of parent commit ID's, which as previously mentioned is how a commit stores the references to its parents. This is what links commits together into chains we know as Git branches.

When does Git create commits?

Most new Git commits are created when you use the git commit command. This will create a new commit using the currently checked out commit as its parent. Usually this will be the tip of the currently checked out branch. In this case, Git will create a new commit pointing back to the current tip commit as its parent, and update the branch ref to point to the new commit.

Git can also create new commits when running Git merge. Specifically, this happens when merging to branches that have diverged - i.e. each branch has new commits that don't exist on the other branch.

However, sometimes Git is able to perform a fast-forward merge, which doesn't result in a merge commit. This occurs when the branch being merged into is an ancestor of the branch being merged. In this case, the branch ref being merged into can simply be "fast-forwarded" to the same commit as the branch being merged.

How are Git commits stored?

Git commits are stored the same way as all other Git objects such as blobs and trees. Git's code builds up the commit in a memory buffer in the format shown above.

Next, the OpenSSL SHA library is used to calculate the SHA-1 hash of the commit data.

Then the Zlib library is used to compress the commit's content for efficient storage, and the compressed commit is written to a new file in Git's object database. The file is named based on the SHA-1 valued that was calculated in the previous step.

Does Git delete commits?

No - Git won't delete your commits unless they become orphaned, in which case they could eventually be cleaned up by Git's garbage collection process.

An orphaned commit is one that is not reachable by any branch refs or tags. This can occur if you check out a non-tip commit on a branch which puts Git into a detached HEAD state. If you create a new commit at this point, and then abandon it without creating a new branch at that location, it will be orphaned.

Can Git reuse commits?

Git doesn't really re-use commits the way it re-uses other objects like blobs and trees. The reason is that a commit's content has very time-specific and user-specific data, such as the date/time the commit was made, the author name/email, and the commit message.

Since the SHA-1 hash of the commit is based on all this unique data, along with the parent commit ID's, it is pretty much impossible that a matching SHA-1 of a commit would be recreated in the future.

However, commits can be re-applied to different branches in various ways using commands like git rebase and git cherry-pick. Note that this will usually result in a new commit ID being generated for the commit.

How do I commit a file in Git?

You can commit a new untracked file in Git by using the git add and git commit commands as follows:

$ git add <new-file-name>
$ git commit -m "Write your commit message here"

Git commit example

Let's go through a quick example to illustrate the standard commit workflow:

  1. Create and initialize the Git repo:
$ mkdir testrepo
$ cd testrepo
$ git init
Initialized empty Git repository...
  1. Create a new file and run git status:
$ touch newfile.txt

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	newfile.txt

nothing added to commit but untracked files present (use "git add" to track)
  1. Run git add to add the file to the staging area, followed by git status:
$ git add newfile.txt

$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   newfile.txt
  1. Run git commit followed by git status and git log to examine the first commit, note it has no parents:
$ git commit -m "Add newfile"
[master (root-commit) 72e2a98] Add newfile
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 newfile.txt

$ git status
On branch master
nothing to commit, working tree clean

$ git log
commit 72e2a98ec8d43f2547b31583f7a6a116eaff82e1 (HEAD -> master)
Author: Jacob Stopak <jacob@initialcommit.io>
Date:   Sun Sep 18 12:28:07 2022 -0700

    Add newfile
  1. Modify the file and run git status:
$ echo "asdf" > newfile.txt

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   newfile.txt

no changes added to commit (use "git add" and/or "git commit -a")
  1. Run git add followed by git commit
$ git add newfile.txt

$ git commit -m "Modify newfile"
[master 6477c34] Modify newfile
 1 file changed, 1 insertion(+)
  1. Run git log to see the new commit and its parent:
$ git log
commit 6477c34879083ddb338306a21079d9d4422266d4 (HEAD -> master)
Author: Jacob Stopak <jacob@initialcommit.io>
Date:   Sun Sep 18 12:30:51 2022 -0700

    Modify newfile

commit 72e2a98ec8d43f2547b31583f7a6a116eaff82e1
Author: Jacob Stopak <jacob@initialcommit.io>
Date:   Sun Sep 18 12:28:07 2022 -0700

    Add newfile

Git commit options

Here are some useful and common command-line options and flags for the git commit command.

Set the commit message from the command line

The most commonly used option for the git commit command is the -m flag, which is used to specify a commit message on the command line as follows:

$ git commit -m "Add new feature to the application"

Note that it's very important to write clear, concise commit messages in a consistent style.

Automatically stage and commit changes in one command

Next, the -a flag can be used to automatically stage changed tracked files and commit them in a single step:

$ git commit -a -m "Modifying the file"

Note that this will only stage and commit tracked files with changes, including deleting a tracked file, but it will not stage a newly created untracked file.

Amend your last commit message

Sometimes you'll realize that after you commit, you made a typo or mistake in the commit message. You can quick fix this using the --amend option, along with -m to specify the new commit message.

$ git commit --amend -m "Modifying the file, adjusted"

Note that you can only use this method to fix the commit message of the most recent commit on a branch - i.e. the branch tip. The reason is that since the commit message is a part of the commit's content, Git will need to recalculate the SHA-1 hash of the commit since the message changed.

If Git allowed this for earlier commits before the branch tip, it would need to rebuilt all subsequent commits since they all depend on each other.

What to do after you commit in Git

After you make a commit, you can feel free to continue your development on your current branch. If the feature you're working on is complete, it may be time to merge the branch back into the development or main branch.

In addition, you probably want to use git push to upload your new commits to the remote repository. This will serve to backup your work and also share the commits with other collaborators on your project.

Summary

In this article, we discussed the basics of the commits in Git. We explained what a commit object is, where they are stored, and how they are formatted.

We also examined various ways to create commits and what do to after you create one.

Next Steps

If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this, we documented the first version of Git's code and discuss it in detail.

We hope you enjoyed this post! Feel free to shoot me an email at jacob@initialcommit.io with any questions or comments.

References

  1. Memory buffer - https://en.wikipedia.org/wiki/Data_buffer
  2. OpenSSL SHA Library - https://www.openssl.org/docs/man3.0/man3/SHA1.html
  3. Zlib library - https://zlib.net/

Final Notes