Image of A 16 Year History of the Git Init Command

ADVERTISEMENT

Introduction

Git is a fascinating project and offers a lot of interesting nuggets to learn from. At its core lies a simplistic data model for representing file content and changes in the form of blobs, trees, and commits.

Git elegantly layers on several important technologies including data compression, hash functions, a content-addressable database (also known as the object database or object store), directed acyclic graph, and distributed networking to create the efficient and performant tool us devs use every day.

However, before any of these technologies can be put to use, a Git repository must first be created and initialized. This is accomplished using the git init command. Developers often learn this command without understanding what it really does.

In this article, we'll discuss how the git init command originated and how it has changed each year for the past 16 years since Git's inception in April 2005. Then we will examine snapshots from Git's code every year from 2005 to 2021 to get a feel for how the git init command and the object database have evolved, from a code perspective, over the past 16 years.

The method

We'll start by looking directly at the initial commit of Git's code. Git is mostly written in the C programming language.

In order to access Git's initial commit, I cloned down Git's official codebase using the command git clone https://github.com/git/git.

I then located the first commit in the whole repository, by running git log --reverse to list all commits starting from the oldest. Git's initial commit looks like this in the log (note Linus Torvalds' dark sense of humor):

commit e83c5163316f89bfbde7d9ab23ca2e25604af290
Author: Linus Torvalds <torvalds@ppc970.osdl.org>
Date:   Thu Apr 7 15:13:13 2005 -0700

    Initial revision of "git", the information manager from hell

Now that we know the commit ID of Git's very first commit, the goal was to locate the code responsible for initializing Git's object database (i.e. the code that runs when the git init command is executed) and see how this changed over the years.

I decided to grab the first commit in the new year for all years between 2006 and 2021. This was pretty easy using the --since <date> flag on the git log command, as follows:

git log --since="2006-01-01" --reverse

This lists all commits since January 1st, 2006 in reverse order. Therefore the first commit in that list is the first commit made to Git's history in 2006 (not including history rearrangements). I reran this command for all years between 2006-2021 and noted the commit IDs, which represented the first commit made in each year.

The last step was to locate the file that the object database code lived in, in each commit. This was a tad tricky since, over the past 16 years, the file has been renamed a few times. But it was pretty easy to track down.

When Git was first created, the git init command didn't exist yet. Instead, it was simply called init-db. When this command was run, Git initialized an empty Git repo in the current directory. The code for this was in a file called init-db.c in the project root directory.

Within 2 years, this file was renamed to builtin-init-db.c. And then around 2010, the file was moved and renamed to builtin/init-db.c, which is how it exists today in Git's current source code.

Now that I had a list of Git's first commits in each year, and the name of the file that I was looking for, I ran a series of diffs between each pair of subsequent commits.

Here is the command I used to compare the same file (in this example init-db.c) across two commits:

git diff <commit1> <commit2> -- init-db.c

Or, if the file name changed between two commits, the command needed to be altered to:

git diff <commit1>:init-db.c <commit2>:builtin-init-db.c

Using these diffs, I put together the Git history in this article. Each section below outlines the notable changes to the git init (or init-db) command and object database in the preceding year.

April 17th, 2005: Git's Initial Commit

As we saw previously, Git's first commit has ID e83c5163316f89bfbde7d9ab23ca2e25604af290 and was made by Linus Torvalds on April 17th, 2005. It contains 10 files and only about 1000 lines of code, and it actually works! Most of these 10 files directly correspond to one of the original 7 Git commands.

However, for this article, we are interested only in the init-db.c file, which contains the code related to the initialization of a Git repository. I pasted the code from the original init-db.c file below and added inline comments to describe what it does:

/*
 *  The purpose of this file is to be compiled into an executable
 *  called `init-db`. When `init-db` is run from the command line
 *  it will initialize a Git repository by creating the object
 *  store (a directory called `.dircache/objects` by default)
 *  which will store the content that users commit in order to
 *  track the history of the repository over time.
 *
 *  This whole file (i.e. everything in the main function) will run
 *  when ./init-db executable is run from the command line.
 */

#include "cache.h"

/*
 * Function: `main`
 *
 * Parameters:
 *      -argc: The number of command-line arguments supplied, inluding the 
 *             command itself.
 *      -argv: An array of the command line arguments, including the command 
 *             itself.
 *
 * Purpose: Standard `main` function definition. Runs when the executable 
 *          `init-db` is run from the command line.
 */
int main(int argc, char **argv)
{
    /* 
     * The `char *` format of the variables below allows them to be used as 
     * strings of characters instead of just holding one single character. 
     * Just think of these as strings.
     */
    char *sha1_dir, *path;

    /* Declaring three integers to be used later. */
    int len, i, fd;

    /*
     * Attempt to create a directory called `.dircache` in the current 
     * directory. If it fails, `mkdir()` will return -1 and the program will 
     * print an error message and exit.
     */
    if (MKDIR(".dircache") < 0) {
        perror("unable to create .dircache");
        exit(1);
    }
    
    /*
     * Set `sha1_dir` (i.e. the path to the object store) to the value of the
     * `DB_ENVIRONMENT` environment variable, which defaults to 
     * `SHA1_FILE_DIRECTORY` as defined in "cache.h". If the environment 
     * variable is not defined (and it most likely won't be), getenv() will 
     * return a null pointer. 
     */
    sha1_dir = getenv(DB_ENVIRONMENT);

    /*
     * This code block will only be executed if `sha1_dir` is NOT null, i.e., 
     * if the environment variable above was defined.
     */
    if (sha1_dir) {
        struct stat st;
        if (!(stat(sha1_dir, &st) < 0) && S_ISDIR(st.st_mode))
            return 1;
        fprintf(stderr, "DB_ENVIRONMENT set to bad directory %s: ", sha1_dir);
    }

    /*
     * Fall through here if a custom object store path was not specified or
     * was not valid. 
     */

    /*
     * Set `sha1_dir` to the default value `.dircache/objects` as defined in 
     * "cache.h", then print a message to the screen conveying this.
     */
    sha1_dir = DEFAULT_DB_ENVIRONMENT;
    fprintf(stderr, "defaulting to private storage area\n");

    /*
     * Set `len` to the length of the string in `sha1_dir`, i.e., the length
     * of the string `.dircache/objects`. This will be used later to build the 
     * subdirectories in the object database, where hash-indexed objects will 
     * be stored.
     */
    len = strlen(sha1_dir);

    /*
     * Attempt to create a directory inside `.dircache` called `objects`. If 
     * it fails, `mkdir()` will return `-1` and the program will print a 
     * message and exit.
     */
    if (MKDIR(sha1_dir) < 0) {
        if (errno != EEXIST) {
            perror(sha1_dir);
            exit(1);
        }
    }

    /*
     * Allocate space for `path` with size len` (size in bytes of `sha1_dir`) 
     + 40 bytes.
     */
    path = malloc(len + 40);

    /* Copy the `sha1_dir` to `path`. */
    memcpy(path, sha1_dir, len);

    /*
     * Execute this loop 256 times to create the 256 subdirectories inside the 
     * `.dircache/objects/` directory. The subdirectories will be named `00`
     * to `ff`, which are the hexadecimal representations of the numbers 0 to 
     * 255. Each subdirectory will be used to hold the objects whose SHA1 hash 
     * values in hexadecimal representation start with those two digits.
     */
    for (i = 0; i < 256; i++) {
        /*
         * Convert `i` to a two-digit hexadecimal number and append it to the 
         * path variable after the `.dircache/objects/` part. That way, each 
         * time through the loop we build up one of the following paths: 
         * `.dircache/objects/00`, `.dircache/objects/01`, ...,
         * `.dircache/objects/fe`, `.dircache/objects/ff`.
         */
        sprintf(path+len, "/%02x", i);

        /*
         * Attempt to create the current subdirectory. If it fails, `mkdir()` 
         * will return -1 and the program will print a message and exit.
         */
        if (MKDIR(path) < 0) {
            if (errno != EEXIST) {
                perror(path);
                exit(1);
            }
        }
    }
    return 0;
}

The main things to take note of here are:

  1. The file basically contains 1 big main() function that is executed when the command ./init-db is entered on the command line. Command-line options are accepted as arguments.

  2. The code attempts to create the hidden directory .dircache (which is the equivalent of the modern .git folder), and then creates the objects subdirectory inside that.

  3. Finally, a for loop is used to create an additional 256 subdirectories in the .dircache/objects folder. The subdirectories are named 00 to ff, which are the hexadecimal representations of the numbers 0 to 255. Each subdirectory will be used to hold the objects whose SHA1 hash values in hexadecimal representation start with those two digits.

Here is the final contents of the .dircache/objects/ directory after running the ./init-db command in an empty directory:

00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff

Now that we have an idea of how Git's original object database is structured at the outset in April 2005, let's fast forward to January 2006 to see what changed.

January 4th, 2006

In Git's first new year, several changes were made to Git's initialization code. Most new code that was added is a bit long, so I'll summarize it in words before showing a diff of the more interesting changes:

  1. The hidden .dircache directory was renamed to the comforting .git that we know and love:
-    if (mkdir(".dircache", 0700) < 0) {
-        perror("unable to create .dircache");
-        exit(1);
+    git_dir = getenv(GIT_DIR_ENVIRONMENT);
+    if (!git_dir) {
+        git_dir = DEFAULT_GIT_DIR_ENVIRONMENT;
+        fprintf(stderr, "defaulting to local storage area\n");
     }
+    safe_create_dir(git_dir);
  1. The init command was renamed from init-db to git-init-db:
+static const char init_db_usage[] =
+"git-init-db [--template=<template-directory>]";
  1. Also notice that an optional --template command-line option was added for the user to specify a custom directory to copy templates (like hooks) from. The default templates are installed along with Git in the path /usr/share/git-core/templates and copied into the .git folder when the repo is initialized.

  2. New functions were added to safely create directories on the local filesystem and copy files.

  3. A new function was added to create default files such as .git/refs/heads and .git/refs/tags:

 ... 
+    static char path[PATH_MAX];
+    memcpy(path, git_dir, len);
+
+    /*
+     * Create .git/refs/{heads,tags}
+     */
+    strcpy(path + len, "refs");
+    safe_create_dir(path);
+    strcpy(path + len, "refs/heads");
+    safe_create_dir(path);
+    strcpy(path + len, "refs/tags");
+    safe_create_dir(path);
  1. Create the default symlink from .git/HEAD to the master branch:
+    /*
+     * Create the default symlink from ".git/HEAD" to the "master"
+     * branch, if it does not exist yet.
+     */
+    strcpy(path + len, "HEAD");
+    if (read_ref(path, sha1) < 0) {
+        if (create_symref(path, "refs/heads/master") < 0)
+            exit(1);
+    }
  1. Instead of immediately creating the 256 subdirectories within .git/objects/ on repo initialization, Git doesn't create any of them. Instead, it creates those directories later on, one by one, only when an object (blob, tree, or commit) is born that starts with those respective 2 characters.

Furthermore, Git introduced packfiles for storage optimization and network transfer performance, to be stored in the .git/objects/pack and referenced in .git/objects/info directories. For more details, check out Git's official documentation on packfiles.

Check out the relevant diff for these structural changes made to the object store:

diff --git a/init-db.c b/init-db.c
index 25dc13fe10..ead37b5ed8 100644
--- a/init-db.c
+++ b/init-db.c
@@ -1,51 +1,279 @@
-    for (i = 0; i < 256; i++) {
-        sprintf(path+len, "/%02x", i);
-        if (mkdir(path, 0700) < 0) {
-            if (errno != EEXIST) {
-                perror(path);
-                exit(1);
-            }
-        }
-    }
+
+    safe_create_dir(sha1_dir);
+    strcpy(path+len, "/pack");
+    safe_create_dir(path);
+    strcpy(path+len, "/info");
+    safe_create_dir(path);
     return 0;
 }

January 1st, 2007

  1. The init-db.c source file had its name changed to builtin-init-db.c.
  2. The main() function's name was changed to cmd_init_db():
-int main(int argc, char **argv)
+int cmd_init_db(int argc, const char **argv, const char *prefix)
  1. The --shared option was added to configure a repository to be shared among several users. This allows users belonging to the same group to push into that repository:
 static const char init_db_usage[] =
-"git-init-db [--template=<template-directory>]";
+"git-init-db [--template=<template-directory>] [--shared]";
 ... 
     for (i = 1; i < argc; i++, argv++) {
-        char *arg = argv[1];
+        const char *arg = argv[1];
         if (!strncmp(arg, "--template=", 11))
             template_dir = arg+11;
+        else if (!strcmp(arg, "--shared"))
+            shared_repository = PERM_GROUP;
+        else if (!strncmp(arg, "--shared=", 9))
+            shared_repository = git_config_perm("arg", arg+9);
         else
-            die(init_db_usage);
+            usage(init_db_usage);
     }
 ... 
+    if (shared_repository) {
+        char buf[10];
+        /* We do not spell "group" and such, so that
+         * the configuration can be read by older version
+         * of git.
+         */
+        sprintf(buf, "%d", shared_repository);
+        git_config_set("core.sharedrepository", buf);
+        git_config_set("receive.denyNonFastforwards", "true");
+    }

January 1st, 2008

  1. The init command was renamed from git-init-db to git-init.
  2. The -q or --quiet flag was added to suppress output from the command:
 static const char init_db_usage[] =
-"git-init-db [--template=<template-directory>] [--shared]";
+"git-init [-q | --quiet] [--template=<template-directory>] [--shared]";
 ... 
     for (i = 1; i < argc; i++, argv++) {
         const char *arg = argv[1];
 ... 
+        else if (!strcmp(arg, "-q") || !strcmp(arg, "--quiet"))
+                quiet = 1;
         else
             usage(init_db_usage);
     }
  1. Added support for bare repositories - repos that are only meant to be pushed/pulled from:
-    /* Enable logAllRefUpdates if a working tree is attached */
-    if (!is_bare_git_dir(git_dir))
-        git_config_set("core.logallrefupdates", "true");
+    if (is_bare_repository())
+        git_config_set("core.bare", "true");
+    else {
+        const char *work_tree = get_git_work_tree();
+        git_config_set("core.bare", "false");
+        /* allow template config file to override the default */
+        if (log_all_ref_updates == -1)
+            git_config_set("core.logallrefupdates", "true");
+        if (work_tree != git_work_tree_cfg)
+            git_config_set("core.worktree", work_tree);
+    }

January 2nd, 2009

  1. Added checks for case sensitivity of filesystems:
+        /* Check if the filesystem is case-insensitive */
+        path[len] = 0;
+        strcpy(path + len, "CoNfIg");
+        if (!access(path, F_OK))
+            git_config_set("core.ignorecase", "true");
  1. The init command was renamed from git-init to git init and the --bare flag was added to initialize a bare repository directly:
 static const char init_db_usage[] =
-"git-init [-q | --quiet] [--template=<template-directory>] [--shared]";
+"git init [-q | --quiet] [--bare] [--template=<template-directory>] [--shared[=<permissions>]]";
 ... 
     for (i = 1; i < argc; i++, argv++) {
         const char *arg = argv[1];
         if (!prefixcmp(arg, "--template="))
             template_dir = arg+11;
 ... 
+        else if (!strcmp(arg, "--bare")) {
+            static char git_dir[PATH_MAX+1];
+            is_bare_repository_cfg = 1;
+            setenv(GIT_DIR_ENVIRONMENT, getcwd(git_dir,
+                        sizeof(git_dir)), 0);
+        } else if (!strcmp(arg, "--shared"))
 ... 
     }

January 2nd, 2010

  1. The command-line argument parsing method was updated from a for loop to a struct with expected options. See diff below:
diff --git a/builtin-init-db.c b/builtin-init-db.c
index d30c3fe2ca..dd84caecbc 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -377,27 +393,65 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
     const char *git_dir;
     const char *template_dir = NULL;
     unsigned int flags = 0;
-    int i;
-
-    for (i = 1; i < argc; i++, argv++) {
-        const char *arg = argv[1];
-        if (!prefixcmp(arg, "--template="))
-            template_dir = arg+11;
-        else if (!strcmp(arg, "--bare")) {
-            static char git_dir[PATH_MAX+1];
-            is_bare_repository_cfg = 1;
-            setenv(GIT_DIR_ENVIRONMENT, getcwd(git_dir,
-                        sizeof(git_dir)), 0);
-        } else if (!strcmp(arg, "--shared"))
-            init_shared_repository = PERM_GROUP;
-        else if (!prefixcmp(arg, "--shared="))
-            init_shared_repository = git_config_perm("arg", arg+9);
-        else if (!strcmp(arg, "-q") || !strcmp(arg, "--quiet"))
-            flags |= INIT_DB_QUIET;
-        else
-            usage(init_db_usage);
+    const struct option init_db_options[] = {
+        OPT_STRING(0, "template", &template_dir, "template-directory",
+                "provide the directory from which templates will be used"),
+        OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
+                "create a bare repository", 1),
+        { OPTION_CALLBACK, 0, "shared", &init_shared_repository,
+            "permissions",
+            "specify that the git repository is to be shared amongst several users",
+            PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+        OPT_BIT('q', "quiet", &flags, "be quiet", INIT_DB_QUIET),
+        OPT_END()
+    };
+
+    argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);

Note that originally (the removed lines in red), a for loop was used to iterate through the command-line options supplied to the init command. If an option string matched a set of predefined keywords, such as --template, --bare, --shared, or --quiet, the corresponding functionality would be invoked.

The updated version (the added lines in green) uses a custom struct of predefined options called init_db_options[] to store the command-line options. These are fed into a new function called parse_options() to match the options to the user supplied command.

January 3rd, 2011

  1. The commands to create the object database structure were refactored into a single method called create_object_directory():
diff --git a/builtin-init-db.c b/builtin/init-db.c
index dd84caecbc..e3af9eaa87 100644
--- a/builtin-init-db.c
+++ b/builtin/init-db.c
@@ -280,11 +294,26 @@ static int create_default_files(const char *template_path)
     return reinit;
 }
 
+static void create_object_directory(void)
+{
+    const char *object_directory = get_object_directory();
+    int len = strlen(object_directory);
+    char *path = xmalloc(len + 40);
+
+    memcpy(path, object_directory, len);
+
+    safe_create_dir(object_directory, 1);
+    strcpy(path+len, "/pack");
+    safe_create_dir(path, 1);
+    strcpy(path+len, "/info");
+    safe_create_dir(path, 1);
+
+    free(path);
+}
+
 int init_db(const char *template_dir, unsigned int flags)
 {
-    const char *sha1_dir;
-    char *path;
-    int len, reinit;
+    int reinit;
 
     safe_create_dir(get_git_dir(), 0);
 
@@ -299,16 +328,7 @@ int init_db(const char *template_dir, unsigned int flags)
 
     reinit = create_default_files(template_dir);
 
-    sha1_dir = get_object_directory();
-    len = strlen(sha1_dir);
-    path = xmalloc(len + 40);
-    memcpy(path, sha1_dir, len);
-
-    safe_create_dir(sha1_dir, 1);
-    strcpy(path+len, "/pack");
-    safe_create_dir(path, 1);
-    strcpy(path+len, "/info");
-    safe_create_dir(path, 1);
+    create_object_directory();
 
     if (shared_repository) {
         char buf[10];

January 1st, 2012

  1. Added the command-line option --separate-git-dir, which allows the user to supply a custom path to store the .git folder for the current repository. If the repo has already been initialized (i.e. the init command was already run in the same project folder), the exiting .git folder will be moved to the supplied path:
     const struct option init_db_options[] = {
         OPT_STRING(0, "template", &template_dir, "template-directory",
-                "provide the directory from which templates will be used"),
+                "directory from which templates will be used"),
         OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
                 "create a bare repository", 1),
         { OPTION_CALLBACK, 0, "shared", &init_shared_repository,
@@ -427,11 +490,16 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
             "specify that the git repository is to be shared amongst several users",
             PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
         OPT_BIT('q', "quiet", &flags, "be quiet", INIT_DB_QUIET),
+        OPT_STRING(0, "separate-git-dir", &real_git_dir, "gitdir",
+               "separate git dir from working tree"),
         OPT_END()
     };
  1. Clarified and expanded error messages, warnings, and command output.

January 1st, 2013

No notable changes.

January 1st, 2014

No notable changes.

January 5th, 2015

No notable changes.

January 2nd, 2016

  1. Updated path variable types from characters strings such as char path[10] to structs of type strbuf as follows:
struct strbuf path = STRBUF_INIT;

Here is how the strbuf struct is defined:

/**
 * This is the string buffer structure. The `len` member can be used to
 * determine the current length of the string, and `buf` member provides
 * access to the string itself.
 */
struct strbuf {
        size_t alloc;
        size_t len;
        char *buf;
}

Here are the benefits of using the custom strbuf struct, according to Git's contemporary documentation:

/**
 * strbuf's are meant to be used with all the usual C string and memory
 * APIs. Given that the length of the buffer is known, it's often better to
 * use the mem* functions than a str* one (memchr vs. strchr e.g.).
 * Though, one has to be careful about the fact that str* functions often
 * stop on NULs and that strbufs may have embedded NULs.
 *
 * A strbuf is NUL terminated for convenience, but no function in the
 * strbuf API actually relies on the string being free of NULs.
 *
 * strbufs have some invariants that are very important to keep in mind:
 *
 *  - The `buf` member is never NULL, so it can be used in any usual C
 *    string operations safely. strbuf's _have_ to be initialized either by
 *    `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
 *
 *    Do *not* assume anything on what `buf` really is (e.g. if it is
 *    allocated memory or not), use `strbuf_detach()` to unwrap a memory
 *    buffer from its strbuf shell in a safe way. That is the sole supported
 *    way. This will give you a malloced buffer that you can later `free()`.
 *
 *    However, it is totally safe to modify anything in the string pointed by
 *    the `buf` member, between the indices `0` and `len-1` (inclusive).
 *
 *  - The `buf` member is a byte array that has at least `len + 1` bytes
 *    allocated. The extra byte is used to store a `'\0'`, allowing the
 *    `buf` member to be a valid C-string. Every strbuf function ensure this
 *    invariant is preserved.
 *
 *    NOTE: It is OK to "play" with the buffer directly if you work it this
 *    way:
 *
 *        strbuf_grow(sb, SOME_SIZE); <1>
 *        strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
 *
 *    <1> Here, the memory array starting at `sb->buf`, and of length
 *    `strbuf_avail(sb)` is all yours, and you can be sure that
 *    `strbuf_avail(sb)` is at least `SOME_SIZE`.
 *
 *    NOTE: `SOME_OTHER_SIZE` must be smaller or equal to `strbuf_avail(sb)`.
 *
 *    Doing so is safe, though if it has to be done in many places, adding the
 *    missing API to the strbuf module is the way to go.
 *
 *    WARNING: Do _not_ assume that the area that is yours is of size `alloc
 *    - 1` even if it's true in the current implementation. Alloc is somehow a
 *    "private" member that should not be messed with. Use `strbuf_avail()`
 *    instead.
*/

January 2nd, 2017

  1. Code streamlining and refactoring for clarification.
  2. Assorted code refactoring.

January 1st, 2018

No notable changes.

January 2nd, 2019

No notable changes.

January 1st, 2020

No notable changes.

January 4th, 2021

  1. Added the command-line option -b or --initial-branch for the user to specify the initial branch name, which will override the default of master:
     const struct option init_db_options[] = {
         OPT_STRING(0, "template", &template_dir, N_("template-directory"),
                 N_("directory from which templates will be used")),
@@ -494,11 +560,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
         OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
         OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
                N_("separate git dir from working tree")),
+        OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
+               N_("override the name of the initial branch")),
+        OPT_STRING(0, "object-format", &object_format, N_("hash"),
+               N_("specify the hash algorithm to use")),
         OPT_END()
     };
 
     argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
  1. Refactored the code related to initializing the HEAD ref so that it points to the default initial branch name, since it is no longer necessarily assumed to be named master:
     /*
-     * Create the default symlink from ".git/HEAD" to the "master"
-     * branch, if it does not exist yet.
+     * Point the HEAD symref to the initial branch with if HEAD does
+     * not yet exist.
      */
     path = git_path_buf(&buf, "HEAD");
     reinit = (!access(path, R_OK)
           || readlink(path, junk, sizeof(junk)-1) != -1);
     if (!reinit) {
-        if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
+        char *ref;
+
+        if (!initial_branch)
+            initial_branch = git_default_branch_name(quiet);
+
+        ref = xstrfmt("refs/heads/%s", initial_branch);
+        if (check_refname_format(ref, 0) < 0)
+            die(_("invalid initial branch name: '%s'"),
+                initial_branch);
+
+        if (create_symref("HEAD", ref, NULL) < 0)
             exit(1);
+        free(ref);
     }
  1. Add the command-line option --object-format (see the first code snippet in this section above) for the user to specify the hash algorithm to use. By default, this is set to sha1, but it can also be set to sha256. Note that the sha256 option is currently only recommended for experimental and testing purposes. It is not guaranteed to be backward-compatible as Git continues to evolve.

Summary

We started with a brief overview of Git's core functionality and how to download Git's source code. We then checked out Git's initial commit and explored the init-db command and code file, the ancestor to the present-day git init command.

We described how this command changed over the course of Git's 16 year history by incrementally checking out the first commit in every calendar year and diffing it with the prior year's commit.

Next Steps

Since you're interested in the history of Git's code, you might be interested in a technical history of version control systems, including Git and it's early predecessors and descendants.

If you're interested in learning more about how Git works under the hood, check out our Baby Git Guidebook for Developers, which dives into Git's code in an accessible way. We wrote it for curious developers to learn how Git works at the code level. To do this we documented the first version of Git's code and discuss it in detail.

Thanks and happy coding! We hope you enjoyed this article. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.

Final Notes

Recommended product: Git Guidebook for Developers