Note: this is the reverse of migrating two seperate disconnected directories to git annex.
I have a git annex repo for all my media that has grown to 57866 files and git operations are getting slow, especially on external spinning hard drives, so I decided to split it into separate repositories.
This is how I did it, with some help from #git-annex
. Suppose the old big repo is at ~/oldrepo
:
# Create a new repo for photos only
mkdir ~/photos
cd photos
git init
git annex init laptop
# Hardlink all the annexed data from the old repo
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
# Regenerate the git annex metadata
git annex fsck --fast
# Also split the repo on the usb key
cd /media/usbkey
git clone ~/photos
cd photos
git annex init usbkey
cp -rl ../oldrepo/.git/annex/objects .git/annex/
git annex fsck --fast
# Connect the annexes as remotes of each other
git remote add laptop ~/photos
cd ~/photos
git remote add usbkey /media/usbkey
At this point, I went through all repos doing standard cleanup:
# Remove unneeded hard links
git annex unused
git annex dropunused --force 1-12345
# Sync
git annex sync
To make sure nothing is missing, I used git annex find --not --in=here
to see if, for example, the usbkey that should have everything could be missing
some thing.
Update: Antoine Beaupré pointed me to this tip about Repositories with large number of files which I will try next time one of my repositories grows enough to hit a performance issue.
This document was originally written by Enrico Zini and added to this wiki by anarcat.
This is a simple way to split a repository, but the resulting split git repository will be larger than is really necessary.
When you
dropunused
all the hard links that are not present in the repository, git-annex will commit a log to the git-annex branch saying "I don't have this content" for each of them. That seems unnecessary since it probably does not have an earlier log saying it contained the content that was hard linked into it, and perhaps could be improved in git-annex to not record that unncessarily, but that's what it does currently.So I suggest running
git annex forget
after the dropunused or at some later point. That will delete all traces of those log files from the git-annex branch.