Image of What is a Makefile? | Turning Git's Source Code into a Program

ADVERTISEMENT

Table of Contents

Introduction

In this article, we'll discuss how Make and Makefiles work at a high level, then as an example describe how Git's original Makefile works and walk through it line by line.

Turning Source Code into a Program

Before getting straight into Makefiles, lets briefly cover how source code gets turned into an actual program that can run on a computer. Source code consists of a set of files and folders that contain code. Make is often used for C or C++ programs being compiled on Linux systems. Therefore, it is common to see C source files and cpp source files in projects that use makefiles.

This source code usually needs to be converted into a form that the computer can understand. This process is called compilation or compiling. A program that performs this conversion is called a compiler.

Sometimes the compiler needs to be given certain pieces of information so it can properly do its job. This information may include:

  1. The names and locations of the source code (input) files to compile
  2. The set of compiled (output) programs to create
  3. The names and locations to put the compiled (output) programs
  4. Whether or not to apply any special options in the compilation process

The process of choosing a compiler, identifying the set of source code files to be included, performing preperation steps, and compiling the code into its final form is called building, or the build process.

What is Make?

Make is a build automation tool. It would be very tedious for a developer to manually run all of the build steps in sequence each time they want to build their program. Build automation tools like Make allow developers to describe the build steps and execute them all at once.

What is a Makefile?

Makefiles are text files that developers use to describe the build process for their programs. The make command can then be used to conveniently run the instructions in the Makefile.

Makefile Structure: Targets, Prerequisites (Dependencies), and Recipes

The basic structure of a makefile is as follows:

target1: prerequisite1
        recipe1

target2: prerequisite2
        recipe2

...

Makefiles contain a list of named sets of commands that can be used to perform different actions within your codebase. Each named set of commands is called a makefile rule, and is made up of a makefile target, one or more optional makefile prerequisites (or makefile dependencies), and a makefile recipe.

  1. A makefile target is a name used to reference the corresponding set of makefile commands to be executed. It often represents a compiled output file or executable.
  2. A makefile prerequisite or makefile dependency is a source file or another target that is required by the current target.
  3. A makefile recipe is the set of commands that are executed by a specific target.
  4. A makefile rule is the combination of target, prerequisite/dependency, and recipe.

A target can either be a file or simply the name for a recipe of commands. When the target acts purely as a name for a set of commands, it is called a phony target. You can think of this kind of like a function name.

Executing a Makefile Target

The default makefile target is the first target listed in the makefile. In the example above this is target1. This can be executed by simply running make in the same directory as your makefile.

You can change the default target in your makefile by adding .DEFAULT_GOAL := target2 at the beginning of the file. Now when you run make in your working directory, the target2 target will execute instead of target1.

You can explicitly run a specific target at any time by running make <target-name>. So you could run make target2 to run that target even if it isn't the default.

Oftentimes the first (and default) target in a makefile is the all target, which calls other needed targets in the makefile in sequence. This is convenient since the developer can just run make in the working directory and the full build process will be completed without running multiple commands.

Another convention is to define a target called clean which deletes all of the compiled output files and executables from the last build. This allows the command make clean to prepare the local filesystem for subsequent builds.

By default, make will output each command in your makefile. You can suppress this by adding an @ sign before each line in the makefile itself.

For example, the line @echo Compiling source files will still display "Compiling source files" text on the shell, but will not show the echo command itself. Similarly the line @touch filename.ext will still create the file, but it will not display the touch command on the command line as if you typed it in manually.

Example Makefile: Git's Original Makefile

Below is the original Makefile for Git. It is used to invoke the gcc C compiler to build binary executable files for each of the original 7 git commands:

  1. init-db
  2. update-cache
  3. cat-file
  4. show-diff
  5. write-tree
  6. read-tree
  7. commit-tree

This Makefile can be invoked in 3 variations (referred to as 3 targets), by running the 3 following commands from the command line shell inside the same directory as the Makefile:

  1. make clean: This removes all previously built executables and build files from the working directory.
  2. make backup: This first runs make clean and then backs up the current directory into a tar archive.
  3. make: This builds the codebase and creates the 7 git executables.

Enough talk - here is the code from Git's first Makefile:

CFLAGS=-g # The `-g` compiler flag tells gcc to add debug symbols to the executable for use with a debugger.
CC=gcc # Use the `gcc` C compiler.
# Specify the names of all executables to make. PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file all: $(PROG)
install: $(PROG) install $(PROG) $(HOME)/bin/
# Include the following dependencies in the build. LIBS= -lssl
# Specify which compiled output (.o files) to use for each executable. init-db: init-db.o
update-cache: update-cache.o read-cache.o $(CC) $(CFLAGS) -o update-cache update-cache.o read-cache.o $(LIBS)
show-diff: show-diff.o read-cache.o $(CC) $(CFLAGS) -o show-diff show-diff.o read-cache.o $(LIBS)
write-tree: write-tree.o read-cache.o $(CC) $(CFLAGS) -o write-tree write-tree.o read-cache.o $(LIBS)
read-tree: read-tree.o read-cache.o $(CC) $(CFLAGS) -o read-tree read-tree.o read-cache.o $(LIBS)
commit-tree: commit-tree.o read-cache.o $(CC) $(CFLAGS) -o commit-tree commit-tree.o read-cache.o $(LIBS)
cat-file: cat-file.o read-cache.o $(CC) $(CFLAGS) -o cat-file cat-file.o read-cache.o $(LIBS)
# Specify which C header files to include in compilation/linking. read-cache.o: cache.h show-diff.o: cache.h
# Define the steps to run during the `make clean` command. clean: rm -f *.o $(PROG) temp_git_file_* # Remove these files from the current directory.
# Define the steps to run during the `make backup` command. backup: clean cd .. ; tar czvf babygit.tar.gz baby-git # Backup the current directory into a tar archive.

As seen in the example above, you can add makefile comments by starting the line with a # sign.

Makefile Variables - Advanced Make Syntax

Variables can be defined in the Makefile to hold specific values. In the Makefile above, words such as CFLAGS and CC are not special in any way. They are just variable names used to store the values that come after the equals sign. Variable names like $(CFLAGS) can be used later in the Makefile to substitute in the variable values where needed. This is convenient since we can use a variable name in multiple places, while only updating it in one place if the value changes.

Specifying the Compiler in Makefiles

Git is written in C, so this Makefile is tailored to a C build process using a C-specific pattern.

The first line CFLAGS=-g specifies the compiler flags - special compiler options - to use during compilation. In this case, the -g flag tells the compiler to output debug information to the console.

The second line CC=gcc identifies the actual compiler to use. GCC is the GNU Compiler Collection. It supports compilation of code in several programming languages including C, C++, Java, and more.

Specifying the Executables in Makefiles

The third line defines a build variable called PROG which contains the names of the executables we'll be creating.

Linking External Libraries in Makefiles

We'll quickly skip ahead to the line which defines the LIBS variable. This stores the external libraries that we want to link into the build process. In this case, we link in the SSL library which allows Git to access cryptographic functions like hashing.

Make Targets and Commands

Throughout the Makefile, there are multiple lines that start with a keyword followed by a colon such as all:, install:, init-db:, etc. As we mentioned earlier, each of these is called a target. Each target essentially maps to a command that you can specify when running Make, in the form make target.

For example, if you open a terminal window and browse to this Makefile's directory, you could run the make all command to run Make on the all target. Similarly you could run make install to run Make on the install target. If no target is specified, the all target will be used by default.

When Make runs a target, it executes the instructions associated with that target in the Makefile.

The All Target

Back to the Makefile, the all: $(PROG) line states that, when Make is run without specifying a target, all targets listed in $(PROG) will be executed. Since $(PROG) lists all 7 of the Baby Git executables, each of them will be executed.

The Install Target

The next target in the Makefile is install. It is run at the command line using make install. This starts the same way as the all target, by specifying the executables to compile using $(PROG). But then it uses the install command to move those built executables into the users home directory.

Baby Git Program Targets

Now for the targets corresponding to the executable names:

  • init-db:
  • update-cache:
  • show-diff:
  • write-tree:
  • read-tree:
  • commit-tree:
  • cat-file:

Each one of these targets specifies which compiled C object (.o) files we want in each of our executables. Below that each one specifies the compiler command to run based no the build variables specified earlier in the file.

The first executable init-db is very simple since it only includes 1 source file: init-db: init-db.o

The other executables (we'll take update-cache as an example) link together multiple C object (.o) files:

update-cache: update-cache.o read-cache.o
     $(CC) $(CFLAGS) -o update-cache update-cache.o read-cache.o $(LIBS)

The second line above gets converted to the following after variable substitution:

gcc -g -o update-cache update-cache.o read-cache.o -lssl

Linking Header Files in Makefiles

After the program targets, there are two lines that specify the C header (.h) files to link to each object (.o) file. The only header file in the Baby Git codebase is cache.h, which gets linked to read-cache.o and show-diff.o. C header files typically contain function definitions and function declarations that will be included in multiple files in the codebase.

The clean Target

This target is invoked using make clean and simply deletes all compiled code and executables from the working directory. It leaves the source files alone so that the program can be built again.

The backup Target

This target is invoked using make backup. First it invokes the clean target. Then it backs up the source code files in the working directory as a tar archive in the parent directory.

GNU Make Manual

A valuable resource when working with make is the GNU Make Manual. It contains some of the official make overviews and documentation of make features and make functionality that we couldn't cover in this article.

Summary

In this article we discussed the basic concepts, terminology and structure of makefiles. We learned about an example of how Git's first Makefile works line by line. We hope it helped you understand how Makefiles work and how they are implemented in practice.

Next Steps

If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.

References

  1. GNU Make Manual - https://www.gnu.org/software/make/manual/make.html

Final Notes