Image of Git Diff – What is it, Uses & Applications

ADVERTISEMENT

Table of Contents

Introduction

You may have heard that Git is a great way to store version history. But how useful is it when you want to compare two or more versions in a repository?

In this article we will explore the ins and outs of the git diff command and how to use it. As you will see, this command is particularly flexible and useful at every stage in the git life cycle. Additionally, Git diff has many options for comparing data between commits and branches. Let's break it down.

A basic understanding of Git version control is recommended to get the most out of this article.

What is Git Diff?

Git diff is a command-line tool used to determine the differences between two things, per line or even per character. It provides more detail than git status or git log, and is much more flexible in its applications.

"After every merge by default git will do a diffstat of everything that changed as a result of that merge because I do care about that. When I merge from somebody, I trust them but on the other hand, hey they might have stopped using their medication, so I trust them but, let's just be honest here, they might have been ok yesterday, but today might not be a good day, so I do diffstat and git does that by default" - Linus Torvalds

What do we mean by two "things"? Almost anything that you want: working directory, staging tree, HEAD, committed changes, branches, or tags

Furthermore, Git diff is especially useful for fixing bugs. In the case of a bug in your application, you may run the git bisect command to help identify which commit introduced the bug. The git diff can then be used for listing the changes between the commit or branch that introduced the bug and the previously working commit.

You can also use git diff to compare one or more files that aren't tracked by Git, or are even outside of the working directory. This is one of the few Git commands that doesn't even need to be used within an existing Git repo. Most Git commands such as git add and git commit will throw an error like fatal: not a git repository (or any of the parent directories): .git if used outside of a Git repository. However, you can use git diff anywhere on your filesystem.

In addition, the git diff output can be formatted to show file names only, instead of the full textual diff.

What Does the Diff Output Mean?

The git diff command outputs a text based representation known as the unified format.

Here is an example of the git diff default usage and output:

$ git diff
diff --git a/demo.rb b/demo.rb
index 16755f9..a474330 100644
--- a/demo.rb
+++ b/demo.rb
@@ -1,3 +1,3 @@
 puts 'hello'
-puts 'cruel'
+puts 'nice'
 puts 'world!'

The output structure is easily explained, but can be tricky to fully grasp. Note that the following sections may refer to "left" and "right" versions. You can think of this as essentially being the previous and current version.

The output may be broken down into the following sections: (3)

  1. Comparison input (header)
    • Displays the left and right files (prefixed with a and b) and the left and right commit hashes
  2. Sections
    • Section header
      • Set off by @@ symbol (@@ -1,3 +1,3 @@)
      • The left and right files are denoted by - and + (@@ -1,3 +1,3 @@)
      • Section start line number for both versions are the first number in each comma delimeted pair (@@ -1,3 +1,3 @@)
      • Section length for both versions is the second number (@@ -1,3 +1,3 @@)
    • Changes
      • Lines only in the left version start with -
      • lines only in the right version start with +
      • Lines in both versions start with a space
      • You can control the number of context lines with git diff -UN, where N is the number of lines

Git Diff Formatting Options

You can modify the text output to suit your needs.

  • Use --color-words to highlight changes on a per-word basis using only colors: git diff --color-words
  • Use --no-prefix to hide the source and destination prefix: git diff --no-prefix
  • Use --src-prefix to specify a custom source prefix instead of "a/": git diff --src-prefix <prefix>
  • Use --dst-prefix to specify a custom destination prefix instead of "b/": git diff --dst-prefix <prefix>
  • Use --word-diff to show a word diff, using the <mode> to delimit changed words: git diff --word-diff
  • Use --ignore-space-change to ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more whitespace characters to be equivalent: git diff -b, or --ignore-space-change
  • Use the GIT_DIFF_OPTS environment variable - this is a bit of a misnomer. The only valid values are -u\<n> or --unified=\<n>, which controls the number of context lines shown in a git diff command.

Comparing Changes with Git Diff

There are many different ways you can Git diff for comparison purposes, such as comparing (known as diffing) local unstaged changes to the previous commit, comparing staged changes to the previous commit, comparing any two files on our machine, and more. Let's go through some of these examples now.

Git Diff Unstaged Changes in Working Directory with Last Commit

$ git diff

You can think of this as the git diff default form. Usually it is run with the simple command git diff and no options. This use case only shows local changes that you made in your repo since the last commit (Git HEAD), that have not been staged. After you have staged all of your changes, git diff will return no output.

When a branch name, commit ID, other ref, or filename is not specified in the git diff command, Git will default to comparing your changes to the HEAD commit (currently checked out commit).

Git Diff Staging Area and Last Commit

$ git diff --staged

You can add the --staged flag if you want to compare staged changes to the last commit, instead of unstaged changes. Note that git diff --staged flag is the same as using git diff --cached.

Comparing Changes in Specific Files: Git Diff -- File

By default, Git will include changes to all available files in the diff output. If you want to diff only a few files at a time, then use the -- <paths> argument. All of the different flavors of git diff support filtering files like this:

$ git diff -- myfile.txt

You can pass multiple files at a time. The example above will only display changes (if any) present in the myfile.txt file.

Diff Two Files on the Filesystem (In Working Tree or Otherwise)

$ git diff --no-index

This form allows the comparison of data between files that aren't in the working directory or aren't tracked by Git at at all. The --no-index flag is optional (and implied) when one or more of the files is outside of the working directory.

Diff Working Tree and Named Commit

$ git diff <commit>

Compare unstaged changes in the working directory with a named commit. Can use HEAD or a branch name for commit instead of a commit hash if desired. Use HEAD to view changes since last commit. The diff output will contain a cumulative list of all files and changes implemented.

Git Diff Comparing Files Between Two Different Commits

$ git diff <commit1> <commit2>

Compare changes between two arbitrary commits. They don't have to belong to the same branch.

$ git diff test

As opposed to comparing the tip of the current branch, you may compare a branch to the tip of a "test" branch using Git diff test.

$ git diff HEAD^ HEAD

You can compare the last commit to the version before the last commit using git diff head.

View Results of a Merge Commit

Git's diff algorithm can be used to check the results of a merge commit:

git diff <merge-commit> <parent-commit>

In this example, the first commit argument must be the commit ID of the merge commit. Subsequent commit arguments refer to the list of parent commits. The list of parent commits can be automatically generated using ^@ syntax:

git diff master master^@

This is equivalent to git show master.

Diff Two Commits Relative to a Common Ancestor

In certain situations, it may be useful to compare two commits based on a common ancestor, so that it is clear that certain changes were not introduced by one of the commits being compared.

For example, consider the below command, which is synonymous with the more common form covered in the "Git Diff Comparing Files Between Two Different Commits" section above:

git diff <commit>..<commit>

The diff from this command may be misleading. Say you have a master and a feature branch, and the master branch has since gained a new commit. If you added a line to a file on the latest master branch commit, then this diff command will make it seem as if the feature branch deleted the line. This is caused by Git directly comparing the snapshots of the latest commits from both the feature and master branch.

Usually, what you want to see are only changes added to the feature branch. i.e. the work you will introduce if you merge this branch with master. You do that by comparing your feature branch with the first common ancestor between the feature branch and master.

You can do this using the ... operator as follows:

git diff master...feature

Diff Any Two Trees

It can be useful to diff any two trees from your Git repo so you may see the differences in data. Trees can be inferred from a commit ID, a branch name, HEAD, or a tag. This also supports the HEAD~ and HEAD^ syntax.

git diff <left tree> <right tree>

Comparing Branches

Comparing files from two Git branches is easy:

git diff branch1 branch2

This will display the unified diff between the commit ID's referenced by the two branch tips.

Combined Diff Format

By default, git diff command options will display the unified diff format between two commits. The combined diff format shows two or more user-specified files with one file and shows how that file is different from each of the specified files. You can use the -c or --cc option to produce a combined diff.

How Do I Diff a File That Has Been Renamed?

You don't need to do anything special to diff a file that has been renamed. Git tracks and will detect the rename in most cases and show a unified output.

Listing Whitespace Errors

Git tracks whitespace data and will produce warnings. You can configure if you want these changes to show in your output.

$ git diff --check # identify and list possible whitespace errors

For Windows users, automatically convert line endings to CRLF on checkout, and convert back to LF on checkin:

$ git config --global core.autocrlf true

For Linux and Mac (OSX) users, automatically convert CRLF to LF on checkin. This is useful if one of your contributors is on Windows and forgot to set the last command:

git config --global core.autocrlf input

Here are some git config --global core.whitespace settings:

  • - prepended indicates to disable the option
  • Omit options to use their default value
  • Options are comma delimited

Enabled by default:

  • blank-at-eol: looks for spaces at the end of a line
  • blank-at-eof: notices blank lines at the end of a file
  • space-before-tab: looks for spaces before tabs at the beginning of a line

Disabled by default:

  • indent-with-non-tab - looks for lines that begin with spaces instead of tabs (and is controlled by the tabwidth option)
  • tab-in-indent - watches for tabs in the indentation portion of a line
  • cr-at-eol - tells Git that carriage returns at the end of lines are OK

Viewing Diff with Git Log

"My favorite way to log commits is git log -p" -Harvard CS

$ git log -p

Using the -p flag with the Git log command is a great way to include the patch (diff output) inline in the git log output.

Understanding Diff when Merging and Rebasing

When merging or rebasing in Git, you may have noticed references to ours and theirs. When merging, ours refers to the currently checked out branch, and theirs refers to the branch you are merging from:

git checkout merge-into-ours
git merge from-theirs

When rebasing in Git, they are flipped. The ours keyword refers to an anonymous branch that holds the result of the rebase so far, and theirs refers to the original branch being rebased.

You can run these commands in the middle of a merge or rebase to help orient yourself with respect to the changes being made:

git diff --ours
git diff --theirs
git diff --base

Git Diff and Submodules

The git diff --submodule flag can be used to show what actually changed in a submodule, instead of merely indicating that something changed.

$ git diff --submodule

If you don't want to type --submodule every time you run git diff, you can set it as the default format by setting the diff.submodule config value to "log":

$ git config diff.submodule log

Another way to accomplish this is with the submodule foreach command:

git submodule foreach 'git diff'

Diffing Binary Files

It is possible to diff various non-text files using Git Attributes. Essentially you configure a custom diff program that converts the binary to text first. Think of it like a plugin. Take Microsoft Word documents for example:

Create the .gitattributes file and add the following data:

*.docx diff=word

Install and configure docx2text:

$ pip install docx2txt
$ git config diff.word.textconv docx2txt

Git diff will now show the text changes when comparing MicroSoft Word documents. However, formatting changes will not show up.

Image files may be diffed in a similar way, by configuring Git to compare the EXIF metadata:

*.png diff=exif
$ git config diff.exif.textconv exiftool

Similar things can be done for Excel files and other binary files.

Diffs in Github or Gitlab

Most Git server software provides excellent graphical diff support. This is especially useful when reviewing a merge request or pull request.

How Do I Diff a File Over Time in the Same Branch?

One possible solution would be to use git log to figure out the hash of the relevant commit that changed the specified file in the specified time frame, and then use the regular git diff commands after that (6):

git log --before="yyyy-MM-dd" --after="yyyy-MM-dd" --follow -- <PATH-TO-FILE>"

Getting Back to Clean State

To return to a clean git diff, clean up the working directory using git reset and remove any untracked files.

Saving What is Shown in the Diff

To actually save the changes shown by git diff, use git commit or git stash.

How Do I Save a Result From Git Diff and Apply It Later?

The diff output is also called a patch. It can be used as input to other Git commands to actually apply those changes. This is called applying a patch in Git.

To save a patch from git diff, run one of the following:

$ git diff > myfile.patch
$ git format-patch -k --stdout HEAD^1..HEAD > myfile.mbox

If the patch was generated using git diff or the Unix diff command, then it can be applied using git apply myfile.patch. This will make changes to the working directory. The command is transactional, meaning it will either apply completely or abort.

If the patch was generated using the git format-patch command, then your job is easier because the patch already contains the commit author and commit message information. Apply the patch using the git am command. This command is able to automatically create commits instead of just changing the working directory.

As an aside, the "git am" command is built to read an mbox file, which is a simple, plain-text format for storing one or more email messages in one text file. This is one possible way to collaborate with git over email, rather than using git on a server.

Alternative Ways to Look at Diffs

The text output of a diff can be hard to read. Many graphical or external diff viewing programs exist to make life easier by displaying diffs side by side.

Common programs used are Sublime Text, Eclipse, VSCode, p4merge, Intellij, and Beyond Compare.

Simply run git difftool instead of git diff to open an external viewing program of your choice.

Run git difftool --tool-help to see what is available on your system.

To configure an external diff tool run git config --global diff.external <executable name>. You may need to wrap the executable in a shell script to get the arguments to be passed correctly. Git passes 7 arguments to the external git tool, but you usually only need arguments number 2 and 5 (old file and new file).

Environment variables:

GIT_EXTERNAL_DIFF is used as an override for the diff.external configuration value. If it’s set, Git will invoke this program when git diff is invoked.

Summary

In this article, we discussed the git diff command, how it works, how to use it to track changes in data, and a variety of different scenarios to apply it. You can use git diff at any point in the git life cycle to track changes in files, working directories, committed changes, and more.

Next Steps

If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.

We hope you enjoyed this post! Feel free to shoot me an email at jacob@initialcommit.io with any questions or comments.

References

  1. Git Kernal Wiki - https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript
  2. Wikipedia, Diff Unified Format - https://en.wikipedia.org/wiki/Diff#Unified_format
  3. Harvard, Diff Ref - https://cs61.seas.harvard.edu/site/ref/diff/
  4. Harvard, Git Ref - https://cs61.seas.harvard.edu/site/ref/git/
  5. GitHub, Python docx2txt - https://github.com/ankushshah89/python-docx2txt
  6. StackOverflow - https://stackoverflow.com/questions/9658110/git-diff-on-date/9658178#9658178
  7. Wikipedia, Git GUI Comparison - https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs

Final Notes