codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›tools

Git Submodule · Subtree · LFS — repos inside repos

Published 2026-04-28· Updated 2026-05-18·0 views

Git Submodule · Subtree · LFS — repos inside repos

When a repo needs to carry another repo's code, or when large binary files have to live inside a repo, the options that come to mind are Submodule, Subtree, Sparse Checkout, and Git LFS. Each has different strengths and traps. This article covers their origins, behavior, common pitfalls, and why monorepos tend to avoid submodules.

1. About the four tools

Tool Origin Problem solved
Git Submodule Git 1.5.3 (2007) Include an external repo pinned at a specific commit
Git Subtree git-subtree joined contrib in 2009 Merge an external repo into a directory; its history blends into the parent
Sparse Checkout Git 1.7 (2010), big improvements in v2.25 Check out only part of a large repo
Git LFS GitHub, 2015 Store large binary files in a separate store

They each address different problems. There's almost no situation where all four apply.

2. Submodule

The parent repo holds only a reference to the child repo. The child's actual files live in the child repo, and the parent only carries metadata that says "this directory is the child repo at SHA xxx".

# Add
git submodule add https://github.com/foo/lib.git vendor/lib
# Creates .gitmodules and clones the child into vendor/lib

# Pull on clone
git clone --recurse-submodules https://github.com/me/parent.git

# Pull submodules into an already-cloned repo
git submodule update --init --recursive

# Bump the child
git submodule update --remote vendor/lib
git add vendor/lib && git commit -m "bump lib"

.gitmodules is the SSOT:

[submodule "vendor/lib"]
    path = vendor/lib
    url = https://github.com/foo/lib.git
    branch = main
Strengths Weaknesses
Child repo's history stays separate Clone and checkout flow is involved
Pin to an exact commit New collaborators get confused often
Permission separation (mix private and public) Extra CI configuration required

3. Subtree

The child repo's contents are physically merged into one directory of the parent. The child's commits land in the parent's history.

# Add
git subtree add --prefix=vendor/lib https://github.com/foo/lib.git main --squash

# Pull child changes
git subtree pull --prefix=vendor/lib https://github.com/foo/lib.git main --squash

# Push parent changes back to the child
git subtree push --prefix=vendor/lib origin-lib main
Strengths Weaknesses
Cloners get all the code without extra commands Parent history grows large
Plain clone and pull just work Pushing back to the child is finicky
No separate metadata like .gitmodules Child history mingles into the parent

4. Sparse Checkout

Pulls only some directories from a large monorepo. Git v2.25's git sparse-checkout brought a major improvement:

git clone --filter=blob:none --no-checkout https://github.com/big/mono.git
cd mono
git sparse-checkout init --cone
git sparse-checkout set apps/web packages/ui
git checkout main

--cone mode operates only at directory granularity — fast and safe. The older non-cone mode supports glob patterns but suffers from performance and stability issues.

This is not about including external repos like submodules; it pulls part of the same repo.

5. Git LFS

Large binaries (images, video, ML weights) compressed and shipped on every commit balloon a repo into the GB range fast. LFS keeps those files on a separate LFS server, leaving only small pointer files in the repo:

# One-time install
git lfs install

# Add tracking
git lfs track "*.psd"
git lfs track "models/*.safetensors"
# .gitattributes is updated
git add .gitattributes

# add · commit · push as usual
git add design.psd
git commit -m "add design"
git push

.gitattributes is the SSOT:

*.psd filter=lfs diff=lfs merge=lfs -text
models/*.safetensors filter=lfs diff=lfs merge=lfs -text

GitHub gives 1 GB free storage and 1 GB / month bandwidth. More needs a data pack purchase.

6. At a glance

Scenario Recommendation
Pin an external library at an exact commit Submodule (or just a package manager if that's enough)
Fork-style: merge external code into your own repo Subtree
Pull only part of one giant monorepo Sparse checkout
Large binaries — images, video, model weights LFS

7. Other paths

Beyond the four:

  • Package managers — npm, pnpm, Cargo, and Maven are usually the more natural place for dependency management. Question whether a submodule is really the answer.
  • Monorepo tools — Nx, Turborepo, Bazel, Buck. Many packages in one repo, with build and cache management.
  • Vendoring — Wholesale-copying external code as if it were your own. You track child updates by hand.
  • Workspaces — pnpm / npm / Yarn workspaces. Auto-link dependencies between packages in the same repo.

8. Common pitfalls

Submodule

  • New collaborator runs git clone and stops there — vendor/lib is empty. The README must mention --recurse-submodules or git submodule update --init.
  • Pushing the child without bumping the parent — teammates' machines see the child at an old SHA. Push child → git add vendor/lib in parent → push parent is a single bundle.
  • CI submodule fetch — GitHub Actions' actions/checkout defaults to submodules: false. Set submodules: recursive explicitly.
  • detached HEAD — submodule directories sit at detached HEAD by default. Be intentional when entering child work.

Subtree

  • Bloated parent history — frequent pulls without --squash grow history fast.
  • Pushing to the child takes practice — having one person own that flow is safer.

Sparse checkout

  • Non-cone mode pitfalls — patterns can mis-match and only show some files. Stick to --cone mode.
  • CI checking out everything — builds tend to be heavier on CI than local.

LFS

  • Existing large files after first tracking — git lfs track alone doesn't move past commits. git lfs migrate import is needed.
  • LFS objects on fork — some hosts don't carry LFS objects when forking.
  • Free quota — GitHub's 1 GB / month bandwidth is gone after one video. Plan ahead.

9. Why monorepos avoid submodules

In a monorepo holding many packages from the same company / team, submodules cause friction:

  • A single PR ends up touching two repos (parent and child).
  • CI cache and dependency graph fragment.
  • The clone, init, update steps create a steep entry barrier for new people.

Workspaces in a package manager, or Nx / Turborepo, are typically recommended instead. Submodules fit better when pinning external OSS or where permissions / licensing must stay separate.

Closing thoughts

Submodule, Subtree, Sparse Checkout, and LFS group together by category, but each solves a different problem. In a monorepo, workspaces (pnpm, Yarn) or monorepo tools (Turborepo, Nx) cause less friction than submodules. For large binaries, LFS is nearly the standard. Before picking a tool, question whether it's actually needed in this spot.

Next

  • (end of tools)

References include git-submodule, Pro Git Submodules, git-subtree, git-sparse-checkout, Git LFS, GitHub LFS quotas, Atlassian — Git Submodules vs Subtree, GitHub Blog — partial clone and shallow clone, Nx, Turborepo, Bazel, and pnpm Workspaces.

More in tools

All in this category →
  • Regular expressions — finding strings by pattern
  • A history of Python dependency tools
  • Linting and formatting
  • Editor setup
  • Gradle
  • Git workflow