2.1.6.1. Git LFS, Shallow Clones, and History Cleanup
2.1.6.1. Git LFS, Shallow Clones, and History Cleanup
Large binary files — images, compiled assets, database dumps — fundamentally break Git's efficiency model. Git stores every version of every file, so a 10GB repository of binaries balloons with each change. Git LFS (Large File Storage) replaces binary files with lightweight pointer files (~130 bytes), storing the actual content in a separate LFS server. Cloning downloads only pointers; actual files are fetched on-demand at checkout. Configure .gitattributes to track patterns (*.psd, *.dll, *.zip) before committing large files. For CI pipelines that only need the latest commit, shallow clones (git clone --depth 1) reduce clone time by 80%+. When large files were accidentally committed without LFS, git filter-repo rewrites history to remove them permanently — but requires team coordination and force-push.
Git performance degrades predictably with repository size. The three main growth vectors are: large binary files (media, compiled assets), deep history (years of commits), and broad scope (monorepo with thousands of files). Each has a different solution.
For binary files, Git LFS redirects storage to a dedicated server while keeping pointer files in the repository. The .gitattributes file declares which patterns use LFS: *.psd filter=lfs, *.dll filter=lfs. Once configured, git lfs track ensures new files matching these patterns automatically use LFS. CI pipelines should use git lfs install and configure LFS fetch to pull only the files needed for the build, not the entire LFS history.
For deep history, shallow clones (--depth N) and treeless clones (--filter=tree:0) reduce clone time dramatically. Azure Pipelines supports fetchDepth configuration. For broad scope, sparse checkout allows developers to clone the repository but only populate the working tree with specific directories — ideal for monorepos where each developer works on one service.
For CI agents, the GIT_LFS_SKIP_SMUDGE=1 environment variable prevents automatic LFS file download during clone, then git lfs pull selectively fetches only the files needed for the current build. This optimization matters when the LFS store contains terabytes of assets but the build only needs a few megabytes. Azure Pipelines supports LFS natively with the lfs: true checkout option.
Scalar (Git VFS) provides an alternative for extremely large repositories — mounting the repo as a virtual filesystem and downloading objects on demand.
Git LFS bandwidth and storage count against Azure DevOps limits. Monitor usage through Organization Settings to avoid unexpected throttling on large teams.