Image of 4 Git Commands that Rewrite Commit History

ADVERTISEMENT

Table of Contents

Introduction

In a previous post, we discussed the inner workings of Pijul, a version control system that provides an alternative to Git. With Pijul, the identity of a commit (the commit ID) is always preserved, regardless of its position in the branch.

Many of Git's commands such as git add, git mv, git pull, git branch, etc, don't alter the repository's commit history at all.

However, certain Git operations do rewrite history by updating the commit ID's of past commits. By rewriting a branch’s history, past commits can be cleaned up and reorganized to make the commit history more readable. Commits can be revised, combined, split, removed, or even reordered.

Although rewriting history can create a clearer commit history, it also comes with potential downfalls. Because rewriting history can create conflicts with other versions of the branch, it’s best to avoid rewriting history that has been pushed to a remote branch. Additionally, rewriting history replaces the branch’s past states with the revised history, making it impossible to revisit a branch’s true past state.

In this article, we’ll cover some of the most common Git commands used to rewrite history.

1. git commit --amend

Adding the --amend option to the git commit command allows the modification of the content and/or message of the last commit on the current branch. To change the contents of the most recent commit, make any file changes you forgot to include and then stage those changes with the git add command. You can then modify your last commit by entering:

git commit --amend

Note: To update the commit message of the last commit without making any file changes, leave the code unchanged.

After running this command, a text editor session will open, where the last commit message can be revised. Saving and quitting the text editor will update the last commit in the repository, which you can check by running git log.

The git commit --amend command changes the commit ID of the last commit, since the commit ID (the SHA1 hash of the commit’s content), depends on the committed changes, commit message, and timestamp. Since one or more of these items are changing, the commit ID will change as well. For this reason, you should generally avoid using this command to amend a commit that you’ve already pushed to a shared branch, as this would result in merge conflicts for users who have already obtained copies of the previously pushed history.

2. git rebase

The git rebase command is used to edit one or more existing commits in your local branch history. This command can be used to combine, edit, reorder, or remove commits. When performing an interactive rebase, indicated by adding an -i flag to the command, commit subcommands can be set in the text editor. The command must also include which commits should be rebased.

For example, the command to interactively rebase the last four commits would be:

git rebase -i HEAD~4

After running this command, the editor window will display the last four commits, along with a list of permitted subcommands:

pick ca3eed4 Add README
pick 38d0f5a Update user documentation
pick 0966ba9 Add basic implementation
pick 71a2e6f Fix user interface bug

# Rebase cbfaa21..71a2e6f onto cbfaa21 (4 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
#       However, if you remove everything, the rebase will be aborted.
#
#
# Note that empty commits are commented out

As mentioned in the above code snippet, several subcommands can be used to change the commit history when rebasing. Every commit will be preceded by pick by default, which tells Git to include the commit in the rebase. This option can be replaced with squash to combine the commit with the previous commit, drop to remove the commit, or edit to split or otherwise change a commit. Reordering the commits in the editor window changes the order of the commits in the commit history.

For example, consider updating the text in the above editor window to these subcommands:

pick ca3eed4 Add README
squash 38d0f5a Update user documentation
pick 0966ba9 Add basic implementation
drop 71a2e6f Fix user interface bug

After saving and exiting the editor, Git will rewind the branch and apply all of the changes in this order. In this example, the commits containing the README and user documentation would be squashed into a single commit. The content of the “basic implementation” commit would remain unchanged, but its commit ID would change due to the removal of its parent commit. The bug fix would be removed entirely from the commit history.

If a rebase uses the edit option, Git will return to the terminal upon reaching that command in the rebase. To make changes to the current commit, unstage the files in the commit by running the git reset command:

git reset HEAD^.

After making any file updates, changes can be staged and committed in as many separate chunks as you desire. To continue to the next step in the interactive rebase, run:

git rebase --continue

Rebasing in Git is especially useful for cleaning up a local branch before pushing it to the remote. It gives the developer an additional level of control over the commit history. However, because rebased commits will have completely new commit IDs, it is recommended that developers only rebase commits that haven’t been pushed yet.

3. git cherry-pick

This command is used to take one or more commits from an existing branch and apply them to another. This can be useful if you already committed a change on one branch, and want to implement it on another branch without merging the rest of the original branch’s history.

The first step is to use git log while on the source branch to check the ID of the commit to be cherry-picked.

After switching to the destination branch, cherry-pick the commit by running:

git cherry-pick commitID

To cherry-pick a range of commits instead, run:

git cherry-pick commitID1^..commitID2

where commit ID1 is the earliest commit to include and commitID2 is the most recent.

The cherry-pick command provides an easy way to apply an existing commit to the current branch without a full rebase. Because the commit now depends on a different commit history, however, the cherry-picked commit will have a different commit ID on the current branch than on the original. This disruption to commit history can lead to unnatural merge conflicts when performing later cherry-picks from the same source branch. Additionally, since cherry-picking only applies changes from the selected commits, it’s important to check that these changes work as expected on the current branch.

4. git filter-repo

This command is a more efficient, open-source alternative to the older command git-filter-branch. It can be used to quickly rewrite the history of an entire repository using user-specified filters. This is a powerful tool with numerous applications, but because it can affect the commit history of the entire repository, it should be used with caution.

Although there are many ways to use this command, it’s especially useful for removing or updating data. For example, if a user has committed a file that should not be included then this command can be used to remove that file from all commits in the repository. To remove a file user_login_info.txt from the repository, run:

git filter-repo --force --invert-paths --path user_login_info.txt

This command can also be used to update the names and emails from past commits. Updating this information requires a mailmap in git-shortlog format that contains the names and emails you want to update. This could look something like:

New Name <new_email> Old Name <old_email>
New Name 2 <new_email2> Old Name 2 <old_email2>

To update these names and emails, run:

git filter-repo --mailmap my-mailmap

Conclusion

Git provides powerful commands for rewriting history that can be used to clean up and reorganize commit history. However, these commands come with potential pitfalls, so it’s important to understand the effects these tools can have.

Rewriting history can create conflicts with other versions of the branch, so in general, it’s best to avoid rewriting history that has been pushed to a remote branch. Additionally, because rewriting history replaces the past states of a branch with the revised history, it eliminates the ability to return to a branch’s true past state.

Although these commands should be used with caution when working with a remote branch, they can be incredibly useful for reorganizing local code. After all, no one wants to look through 17 different "wip" commits or a mountain of reverted commits. These git commands can be used to create a more intuitive list of commits through the lens of retrospection.

Newer generation version control systems like Pijul may use static IDs defined by the content to represent each change, instead of a commit ID. In this case, the ID of each commit would not depend on its context, order, history, parent, or timestamp, but instead on the content itself. As version control evolves into the future, this change could remove the need to rewrite history in the ways we have seen in this article.

If you're interested in learning about how version control systems work under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how version control systems work at the code level. To do this we documented the first version of Git's code and discuss it in detail.

We hope you enjoyed this post! Feel free to shoot me an email at jacob@initialcommit.io with any questions or comments.

Final Notes