Object files stored in .git/annex/objects
are each put in their own directory.
This allows the write bit to be removed from both the file, and its directory,
which prevents accidentally deleting or changing the file contents.
The reasoning for doing this follows:
Normally with git, once you have committed a file, editing the file in the
working tree cannot cause you to lose the committed version. This is an
important property of git. Of course you can rm -rf .git
and delete
commits if you like (before you've pushed them). But you can't lose a
committed version of the file because of something you do with the working
tree version.
It's easy for git to do this, because committing a file makes a copy of it. But git-annex does not make a local copy of a file added to it, because the file could be very large.
So, it's important for git-annex to find another way to preserve the expected property that once committed, you cannot accidentally lose a file. The most important protection it makes is just to remove the write bit of the file. Thus preventing programs from modifying it.
But, that does not prevent any program that might follow the symlink and
delete the symlinked file. This might seem an unlikely thing for a program to
do at first, but consider a command like:
tar cf foo.tar foo --remove-files --dereference
When I tested this, I didn't know if it would remove the file foo symlinked to or not! It turned out that my tar doesn't remove it. But it could have easily went the other way.
Rather than needing to worry about every possible program that might
decide to do something like this, git-annex removes the write bit from the
directory containing the annexed object, as well as removing the write
bit from the file. (The only bad consequence of this is that rm -rf .git
doesn't work unless you first run chmod -R +w .git
)
It's known that this lockdown mechanism is incomplete. The worst hole in
it is that if you explicitly run chmod +w
on an annexed file in the working
tree, this follows the symlink and allows writing to the file. It would be
better to make the files fully immutable. But most systems either don't
support immutable attributes, or only let root make files immutable.
I'm using a git-annex to store build artefacts on a remote bare repo. Some of these artefacts are used in subsequent builds, which clone the artefacts repo, and use 'git annex get' to retrieve the artefacts of interest.
Unfortunately, I've had to add a little kludge along the following lines to my build script fragment:
This is necessary because I need to ensure that the cloned git repo is able to be deleted at all times (I'm using yocto/openembedded and it may want to delete the clone for a variety of reasons).
setcap cap_linux_immutable+ep /usr/bin/git-annex
After doing that, git-annex is able to make files immutable, so the additional directory is not needed any more. Even on file systems / in environments where that is not possible, in some situations file lookup speed is way more important than not being able to delete the target of a symlink.
I have no idea how to code in Haskell, so if somebody else could add an appropriate always/never/only-when-necessary config option I'd be very happy, and my media server would not have any more hiccups when switching songs …