Image of Git Submodule | Tracking sub-repos within your Git repo

ADVERTISEMENT

Table of Contents

Introduction

When you begin a new project, you may want to use an external git repository to reference preexisting code. Several options exist for accomplishing this, whether that be copying and pasting the code from the external repository or using a specific package management system like NPM.

Despite their convenience, these methods have limitations, such as being unable to track changes made to external git repositories. This is where the git submodule command comes in. A git submodule references another repository so that you can incorporate external code. It is the preferred option because it can point to a snapshot of a specific commit in another repository.

Continue reading to learn more about how git submodule works, the best use cases for git submodule, and how to use git submodule safely and effectively.

What is Git Submodule?

A Git submodule is essentially an external repository contained in your local repository to leverage existing code in additional projects.

Git submodule allows you to embed a repository within your main repository. When using git submodule, you are not actually adding any code through the submodule but instead adding information about the submodule. This submodule information points to a specific commit.

Because the code is not added to the main repository, the submodule commit is not updated when you update your main repository. This is great if you have made changes to the submodule commit but want to use that previous snapshot because you know it works correctly with your current project.

Alternatives to Git Submodule

Instead of using git submodule to reuse code, you have two options. The first is simply copying code from the external repository and pasting it into your main repository. While this is a quick and easy way to reuse old code, it's not ideal since you will lose any upstream changes unless you copy and paste the updated code too.

The second option is to use a language package management system like NPM for Node or Gems for Ruby. This can be a great option if you’re working in a specific language and already have everything installed and appropriately versioned. However, it is not as powerful as git submodule since you cannot handle changes to the external repository.

Git Submodule Use Cases

Git submodule is a valuable command to have in your toolbox for a variety of situations. Not only does it allow you to reuse code from other projects, but you are grabbing a snapshot of a commit. This means you can continue working on the repository from which you pulled the code without breaking your current project.

Separating code is one of the most common uses of git submodules in large, complex projects. You can break up extensive codebases with multiple components to make them easier to handle or to delegate specific tasks for each component.

Each component can be added to your main project repository as a submodule. This creates a cleaner repository and allows you to continue to work on components without breaking the overall functionality of your project.

Additionally, you can reuse a component you've created across multiple projects. When you add your component as a submodule, you can easily update it in multiple repositories. This is more flexible and straightforward than copying and pasting, as you won’t need to periodically go back and paste in updated code for multiple projects.

How to Use Git Submodule?

Getting the hang of git submodule is relatively straightforward. Let’s discuss some of the most common tasks you will encounter when integrating this command into your git workflow.

Adding a Submodule

You can easily add a submodule to your main repository using the code format ‘git submodule add git@github.com:url+to/first_submodule.git path_to_first_submodule’

Here’s what it looks like in code form:

$ git submodule add <remote_url> <destination_folder>

And let’s take a look at an example. The following code snippet is adding the ‘old-project’ repository as a submodule into a folder named ‘reuse-code’ on your main repository:

$ git submodule add https://github.com/old-project/old-project.git reuse-code

Cloning into '/home/user/main/old-project’...
remote: Enumerating objects: 1000, done.
remote: Total 1000 (delta 0), reused 0 (delta 0), pack-reused 1000
Receiving objects: 100% (1000/1000), 3.03 MiB | 3.38 MiB/s, done.
Resolving deltas: 100% (500/500), done.

This creates a folder in your Git repository. The folder is named according to the submodule you added. In the example above, it would be ‘reuse-code.’ It also adds a hidden ‘.gitmodules’ file and updates your git configuration file.

If you use git status at this point, you will see these two new files need to be committed:

$ git status 

On branch master
No commits yet

Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file:    .gitmodules
new file:    reuse-code

Commit and Push Submodule

Next, you need to use git commit and git push on your submodule, which is currently in a staged state:

$ git commit -m 'Added reuse-code submodule to repository'

$ git push

Retrieving the Submodules Code

Now the submodule needs to be initialized with the git submodule init command:

$ git submodule init 

With this command, you are adding all relevant entries to the git configuration file, so you can then run git submodule update as needed to retrieve the contents of the submodules. In other words, it’s copying the submodule information from the git modules file (.gitmodules) to the git configuration file (.git/config).

The git submodule init command initializes all submodules present in the main repository because no directory paths were provided. You can provide a specific path to a specific submodule if you only want to initialize a single submodule.

Pushing Updates

If you need to make changes to the submodule’s code, you can do this as if the submodule is a standalone repository. Then you can push these changes using the normal git workflow.

To update the submodule pointer to a different commit, start by cd -ing inside the submodule directory:

$ cd reuse-code/

Next, use git checkout to point to the intended commit or branch:

$ git checkout -b ‘new-changes’

Switched to a new branch 'new-changes'

After you cd back to the main repository, you can use the git status command to see the following output:

$ git status

Changes not staged for commit:
(use ‘git add <file>...’ to update what will be committed)
(use ‘git checkout -- <file>...’ to discard changes in working directory)

modified: old-project/reuse-code (modified content)

When you make your next commit in the main repository, it will update the old pointer to the new pointer—in this case, ‘new-changes.’

All team members will then need to update the code of their submodules. Using git pull does not automatically do this for you. Instead, git pull retrieves the information of the submodule pointing to a new commit. To update the actual code of a submodule, everyone should run the following command:

$ git submodule update

If you or other team members don’t run this command, your submodule will remain checked out to an old commit. You’ll see the same git status output above since the Head is detached until you run the git submodule update command.

How to Clone Project Repositories with Submodules

Cloning a repository that has submodules is a very simple process. Just clone the repository and then follow through with the initialization and update process outlined below.

Use the git clone command within the main repository.

$ git clone

Next, use the git submodule init command to initialize the submodule in the cloned repository.

$ git submodule init

Finally, use git submodule update to update the submodule code and reattach the HEAD.

$ git submodule update

How to Merge Submodule Changes

If you’ve made changes to the repository that the code has been pulled from, you can update the submodules with the following command:

$ git submodule --remote --merge 

This command updates your detached HEAD to the newest commit in the submodule repository.

Submodule Tricks

We previously discussed how it’s crucial to initiate and update your submodules. As yet another thing to remember, it is easy to forget to do this. Fortunately, git submodule offers a few tricks to help you stay on track.

The git clone --recurse-submodules command allows you to clone your repository while also checking out and initializing any submodules contained within the repository.

The git pull --recurse-submodules command pulls the main repository and all its submodules.

Additionally, you might want to execute a command for all submodules simultaneously if your project has multiple submodules. You can use the foreach argument to loop over all of your checked-out submodules with the same command:

$ git submodule foreach <command>

Summary

Git submodule is a very useful tool for reusing code from other projects safely and effectively. Best of all, it doesn’t have to be your own code. The way git submodule snapshots a commit protects your project from upstream changes so that you can reuse code from libraries or open-source projects.

Compared to more cumbersome methods like copying a pasting, git submodule is significantly more efficient for reusing existing code. Now that you have everything you need to get started, jump in and try out git submodule on your next project!

Next steps

If you want to learn more about how repositories and the object database work within Git, check out Baby Git Guidebook for Developers. Baby Git provides an accessible primer on Git’s original codebase to help you learn how Git works at the code level.

We hope you enjoyed this post! Feel free to shoot me an email at jacob@initialcommit.io with any questions or comments.

References

  1. Git SCM Docs - https://git-scm.com/docs/git-submodule

Final Notes