Maybe you had a lot of files scattered around on different drives, and you added them all into a single git-annex repository. Some of the files are surely duplicates of others.
While git-annex stores the file contents efficiently, it would still help in cleaning up this mess if you could find, and perhaps remove the duplicate files.
Here's a command line that will show duplicate sets of files grouped together:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --all-repeated=separate -f1 | \
sed 's/ [^ ]*$//'
Here's a command line that will remove one of each duplicate set of files:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \
xargs -d '\n' git rm
--Joey
Spaces, and other special chars can make filename handeling ugly. If you don't have a restriction on keeping the exact filenames, then it might be easiest just to get rid of the problematic chars.
Maybe you can run something like this before checking for duplicates.
Is there any simple way to search for files with a given key?
At the moment, the best I've come up with is this:
where
<KEY>
is the key. This seems like an awfully longwinded approach, but I don't see anything in the docs indicating a simpler way to do it. Am I missing something?@Chris I guess there's no really easy way because searching for a given key is not something many people need to do.
However, git does provide a way. Try
git log --stat -S $KEY