Git Diff – What is it, Uses & Applications
ADVERTISEMENT
Table of Contents
- Introduction
- What is Git Diff?
- What Does the Diff Output Mean?
- Git Diff Formatting Options
- Comparing Changes with Git Diff
- Git Diff Unstaged Changes in Working Directory with Last Commit
- Diff Two Files on the Filesystem (In Working Tree or Otherwise)
- View Results of a Merge Commit
- Comparing Branches
- Listing Whitespace Errors
- How Do I Save a Result From Git Diff and Apply It Later?
- Alternative Ways to Look at Diffs
- Summary
- Next Steps
- References
Introduction
You may have heard that Git is a great way to store version history. But how useful is it when you want to compare two or more versions in a repository?
In this article we will explore the ins and outs of the git diff
command and how to use it. As you will see, this command is particularly flexible and useful at every stage in the git life cycle. Additionally, Git diff has many options for comparing data between commits and branches. Let's break it down.
A basic understanding of Git version control is recommended to get the most out of this article.
What is Git Diff?
Git diff is a command-line tool used to determine the differences between two things, per line or even per character. It provides more detail than git status or git log, and is much more flexible in its applications.
"After every merge by default git will do a diffstat of everything that changed as a result of that merge because I do care about that. When I merge from somebody, I trust them but on the other hand, hey they might have stopped using their medication, so I trust them but, let's just be honest here, they might have been ok yesterday, but today might not be a good day, so I do diffstat and git does that by default" - Linus Torvalds
What do we mean by two "things"? Almost anything that you want: working directory, staging tree, HEAD, committed changes, branches, or tags
Furthermore, Git diff is especially useful for fixing bugs. In the case of a bug in your application, you may run the git bisect command to help identify which commit introduced the bug. The git diff can then be used for listing the changes between the commit or branch that introduced the bug and the previously working commit.
You can also use git diff to compare one or more files that aren't tracked by Git, or are even outside of the working directory. This is one of the few Git commands that doesn't even need to be used within an existing Git repo. Most Git commands such as git add
and git commit
will throw an error like fatal: not a git repository (or any of the parent directories): .git if used outside of a Git repository. However, you can use git diff anywhere on your filesystem.
In addition, the git diff output can be formatted to show file names only, instead of the full textual diff.
What Does the Diff Output Mean?
The git diff command outputs a text based representation known as the unified format.
Here is an example of the git diff default usage and output:
$ git diff
diff --git a/demo.rb b/demo.rb
index 16755f9..a474330 100644
--- a/demo.rb
+++ b/demo.rb
@@ -1,3 +1,3 @@
puts 'hello'
-puts 'cruel'
+puts 'nice'
puts 'world!'
The output structure is easily explained, but can be tricky to fully grasp. Note that the following sections may refer to "left" and "right" versions. You can think of this as essentially being the previous and current version.
The output may be broken down into the following sections: (3)
- Comparison input (header)
- Displays the left and right files (prefixed with a and b) and the left and right commit hashes
- Sections
- Section header
- Set off by
@@
symbol (@@
-1,3 +1,3@@
) - The left and right files are denoted by
-
and+
(@@-
1,3+
1,3 @@) - Section start line number for both versions are the first number in each comma delimeted pair (@@ -
1
,3 +1
,3 @@) - Section length for both versions is the second number (@@ -1,
3
+1,3
@@)
- Set off by
- Changes
- Lines only in the left version start with
-
- lines only in the right version start with
+
- Lines in both versions start with a space
- You can control the number of context lines with
git diff -UN
, whereN
is the number of lines
- Lines only in the left version start with
- Section header
Git Diff Formatting Options
You can modify the text output to suit your needs.
- Use --color-words to highlight changes on a per-word basis using only colors:
git diff --color-words
- Use --no-prefix to hide the source and destination prefix:
git diff --no-prefix
- Use --src-prefix to specify a custom source prefix instead of "a/":
git diff --src-prefix <prefix>
- Use --dst-prefix to specify a custom destination prefix instead of "b/":
git diff --dst-prefix <prefix>
- Use --word-diff to show a word diff, using the <mode> to delimit changed words:
git diff --word-diff
- Use --ignore-space-change to ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more whitespace characters to be equivalent:
git diff -b
, or--ignore-space-change
- Use the
GIT_DIFF_OPTS
environment variable - this is a bit of a misnomer. The only valid values are-u\<n>
or--unified=\<n>
, which controls the number of context lines shown in a git diff command.
Comparing Changes with Git Diff
There are many different ways you can Git diff for comparison purposes, such as comparing (known as diffing) local unstaged changes to the previous commit, comparing staged changes to the previous commit, comparing any two files on our machine, and more. Let's go through some of these examples now.
Git Diff Unstaged Changes in Working Directory with Last Commit
$ git diff
You can think of this as the git diff default form. Usually it is run with the simple command git diff
and no options. This use case only shows local changes that you made in your repo since the last commit (Git HEAD), that have not been staged. After you have staged all of your changes, git diff will return no output.
When a branch name, commit ID, other ref, or filename is not specified in the git diff command, Git will default to comparing your changes to the HEAD commit (currently checked out commit).
Git Diff Staging Area and Last Commit
$ git diff --staged
You can add the --staged
flag if you want to compare staged changes to the last commit, instead of unstaged changes. Note that git diff --staged flag is the same as using git diff --cached.
Comparing Changes in Specific Files: Git Diff -- File
By default, Git will include changes to all available files in the diff output. If you want to diff only a few files at a time, then use the -- <paths>
argument. All of the different flavors of git diff support filtering files like this:
$ git diff -- myfile.txt
You can pass multiple files at a time. The example above will only display changes (if any) present in the myfile.txt
file.
Diff Two Files on the Filesystem (In Working Tree or Otherwise)
$ git diff --no-index
This form allows the comparison of data between files that aren't in the working directory or aren't tracked by Git at at all. The --no-index
flag is optional (and implied) when one or more of the files is outside of the working directory.
Diff Working Tree and Named Commit
$ git diff <commit>
Compare unstaged changes in the working directory with a named commit. Can use HEAD or a branch name for commit instead of a commit hash if desired. Use HEAD to view changes since last commit. The diff output will contain a cumulative list of all files and changes implemented.
Git Diff Comparing Files Between Two Different Commits
$ git diff <commit1> <commit2>
Compare changes between two arbitrary commits. They don't have to belong to the same branch.
$ git diff test
As opposed to comparing the tip of the current branch, you may compare a branch to the tip of a "test" branch using Git diff test.
$ git diff HEAD^ HEAD
You can compare the last commit to the version before the last commit using git diff head.
View Results of a Merge Commit
Git's diff algorithm can be used to check the results of a merge commit:
git diff <merge-commit> <parent-commit>
In this example, the first commit argument must be the commit ID of the merge commit. Subsequent commit arguments refer to the list of parent commits. The list of parent commits can be automatically generated using ^@
syntax:
git diff master master^@
This is equivalent to git show master.
Diff Two Commits Relative to a Common Ancestor
In certain situations, it may be useful to compare two commits based on a common ancestor, so that it is clear that certain changes were not introduced by one of the commits being compared.
For example, consider the below command, which is synonymous with the more common form covered in the "Git Diff Comparing Files Between Two Different Commits" section above:
git diff <commit>..<commit>
The diff from this command may be misleading. Say you have a master
and a feature
branch, and the master branch has since gained a new commit. If you added a line to a file on the latest master branch commit, then this diff command will make it seem as if the feature branch deleted the line. This is caused by Git directly comparing the snapshots of the latest commits from both the feature and master branch.
Usually, what you want to see are only changes added to the feature branch. i.e. the work you will introduce if you merge this branch with master. You do that by comparing your feature branch with the first common ancestor between the feature branch and master.
You can do this using the ...
operator as follows:
git diff master...feature
Diff Any Two Trees
It can be useful to diff any two trees from your Git repo so you may see the differences in data. Trees can be inferred from a commit ID, a branch name, HEAD, or a tag. This also supports the HEAD~ and HEAD^ syntax.
git diff <left tree> <right tree>
Comparing Branches
Comparing files from two Git branches is easy:
git diff branch1 branch2
This will display the unified diff between the commit ID's referenced by the two branch tips.
Combined Diff Format
By default, git diff command options will display the unified diff format between two commits. The combined diff format shows two or more user-specified files with one file and shows how that file is different from each of the specified files. You can use the -c
or --cc
option to produce a combined diff.
How Do I Diff a File That Has Been Renamed?
You don't need to do anything special to diff a file that has been renamed. Git tracks and will detect the rename in most cases and show a unified output.
Listing Whitespace Errors
Git tracks whitespace data and will produce warnings. You can configure if you want these changes to show in your output.
$ git diff --check # identify and list possible whitespace errors
For Windows users, automatically convert line endings to CRLF on checkout, and convert back to LF on checkin:
$ git config --global core.autocrlf true
For Linux and Mac (OSX) users, automatically convert CRLF to LF on checkin. This is useful if one of your contributors is on Windows and forgot to set the last command:
git config --global core.autocrlf input
Here are some git config --global core.whitespace
settings:
-
prepended indicates to disable the option- Omit options to use their default value
- Options are comma delimited
Enabled by default:
blank-at-eol
: looks for spaces at the end of a lineblank-at-eof
: notices blank lines at the end of a filespace-before-tab
: looks for spaces before tabs at the beginning of a line
Disabled by default:
indent-with-non-tab
- looks for lines that begin with spaces instead of tabs (and is controlled by the tabwidth option)tab-in-indent
- watches for tabs in the indentation portion of a linecr-at-eol
- tells Git that carriage returns at the end of lines are OK
Viewing Diff with Git Log
"My favorite way to log commits is git log -p" -Harvard CS
$ git log -p
Using the -p
flag with the Git log command is a great way to include the patch (diff output) inline in the git log output.
Understanding Diff when Merging and Rebasing
When merging or rebasing in Git, you may have noticed references to ours
and theirs
. When merging, ours
refers to the currently checked out branch, and theirs
refers to the branch you are merging from:
git checkout merge-into-ours
git merge from-theirs
When rebasing in Git, they are flipped. The ours
keyword refers to an anonymous branch that holds the result of the rebase so far, and theirs
refers to the original branch being rebased.
You can run these commands in the middle of a merge or rebase to help orient yourself with respect to the changes being made:
git diff --ours
git diff --theirs
git diff --base
Git Diff and Submodules
The git diff --submodule flag can be used to show what actually changed in a submodule, instead of merely indicating that something changed.
$ git diff --submodule
If you don't want to type --submodule
every time you run git diff, you can set it as the default format by setting the diff.submodule
config value to "log":
$ git config diff.submodule log
Another way to accomplish this is with the submodule foreach command:
git submodule foreach 'git diff'
Diffing Binary Files
It is possible to diff various non-text files using Git Attributes. Essentially you configure a custom diff program that converts the binary to text first. Think of it like a plugin. Take Microsoft Word documents for example:
Create the .gitattributes
file and add the following data:
*.docx diff=word
Install and configure docx2text:
$ pip install docx2txt
$ git config diff.word.textconv docx2txt
Git diff will now show the text changes when comparing MicroSoft Word documents. However, formatting changes will not show up.
Image files may be diffed in a similar way, by configuring Git to compare the EXIF metadata:
*.png diff=exif
$ git config diff.exif.textconv exiftool
Similar things can be done for Excel files and other binary files.
Diffs in Github or Gitlab
Most Git server software provides excellent graphical diff support. This is especially useful when reviewing a merge request or pull request.
How Do I Diff a File Over Time in the Same Branch?
One possible solution would be to use git log to figure out the hash of the relevant commit that changed the specified file in the specified time frame, and then use the regular git diff commands after that (6):
git log --before="yyyy-MM-dd" --after="yyyy-MM-dd" --follow -- <PATH-TO-FILE>"
Getting Back to Clean State
To return to a clean git diff, clean up the working directory using git reset and remove any untracked files.
Saving What is Shown in the Diff
To actually save the changes shown by git diff, use git commit or git stash.
How Do I Save a Result From Git Diff and Apply It Later?
The diff output is also called a patch. It can be used as input to other Git commands to actually apply those changes. This is called applying a patch in Git.
To save a patch from git diff, run one of the following:
$ git diff > myfile.patch
$ git format-patch -k --stdout HEAD^1..HEAD > myfile.mbox
If the patch was generated using git diff or the Unix diff command, then it can be applied using git apply myfile.patch
. This will make changes to the working directory. The command is transactional, meaning it will either apply completely or abort.
If the patch was generated using the git format-patch
command, then your job is easier because the patch already contains the commit author and commit message information. Apply the patch using the git am
command. This command is able to automatically create commits instead of just changing the working directory.
As an aside, the "git am" command is built to read an mbox file, which is a simple, plain-text format for storing one or more email messages in one text file. This is one possible way to collaborate with git over email, rather than using git on a server.
Alternative Ways to Look at Diffs
The text output of a diff can be hard to read. Many graphical or external diff viewing programs exist to make life easier by displaying diffs side by side.
Common programs used are Sublime Text, Eclipse, VSCode, p4merge, Intellij, and Beyond Compare.
Simply run git difftool
instead of git diff to open an external viewing program of your choice.
Run git difftool --tool-help
to see what is available on your system.
To configure an external diff tool run git config --global diff.external <executable name>
. You may need to wrap the executable in a shell script to get the arguments to be passed correctly. Git passes 7 arguments to the external git tool, but you usually only need arguments number 2 and 5 (old file and new file).
Environment variables:
GIT_EXTERNAL_DIFF
is used as an override for the diff.external
configuration value. If it’s set, Git will invoke this program when git diff is invoked.
Summary
In this article, we discussed the git diff command, how it works, how to use it to track changes in data, and a variety of different scenarios to apply it. You can use git diff at any point in the git life cycle to track changes in files, working directories, committed changes, and more.
Next Steps
If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.
We hope you enjoyed this post! Feel free to shoot me an email at jacob@initialcommit.io with any questions or comments.
References
- Git Kernal Wiki - https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript
- Wikipedia, Diff Unified Format - https://en.wikipedia.org/wiki/Diff#Unified_format
- Harvard, Diff Ref - https://cs61.seas.harvard.edu/site/ref/diff/
- Harvard, Git Ref - https://cs61.seas.harvard.edu/site/ref/git/
- GitHub, Python docx2txt - https://github.com/ankushshah89/python-docx2txt
- StackOverflow - https://stackoverflow.com/questions/9658110/git-diff-on-date/9658178#9658178
- Wikipedia, Git GUI Comparison - https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs
Final Notes
Recommended product: Decoding Git Guidebook for Developers