The git annex sync
command provides an easy way to keep several
repositories in sync.
Often git is used in a centralized fashion with a central bare repository which changes are pulled and pushed to using normal git commands. That works fine, if you don't mind having a central repository.
But it can be harder to use git in a fully decentralized fashion, with no central repository and still keep repositories in sync with one another. You have to remember to pull from each remote, and merge the appropriate branch after pulling. It's difficult to push to a remote, since git does not allow pushes into the currently checked out branch.
git annex sync
makes it easier using a scheme devised by Joachim
Breitner. The idea is to have a branch synced/master
(actually,
synced/$currentbranch
), that is never directly checked out, and serves
as a drop-point for other repositories to use to push changes.
When you run git annex sync
, it merges the synced/master
branch
into master
, receiving anything that's been pushed to it. (If there is a
conflict in this merge, automatic conflict resolution is used to
resolve it). Then it fetches from each remote, and merges in any changes that
have been made to the remotes too. Finally, it updates synced/master
to reflect the new state of master
, and pushes it out to each of the remotes.
This way, changes propagate around between repositories as git annex sync
is run on each of them. Every repository does not need to be able to talk
to every other repository; as long as the graph of repositories is
connected, and git annex sync
is run from time to time on each, a given
change, made anywhere, will eventually reach every other repository.
The workflow for using git annex sync
is simple:
- Make some changes to files in the repository, using
git-annex
, or anything else. - Run
git annex sync
to save the changes. - Next time you're working on a different clone of that repository,
run
git annex sync
to update it.
Note that by default, git annex sync
only synchronises the git
repositories, but does not transfer the content of annexed files. If you
want to fully synchronise two repositories content,
you can use git annex sync --content
. You can also configure
preferred content settings to make only some content be synced.
I cam upon git-annex a few months ago. I saw immidiately how it could help with some frustrations I've been having. One in particlar is keeping my vimrc in sync accross multiple locations and platforms. I finally took the time to give it a try after I finally hit my boiling point this morning. I went through the walkthrough and now I have an annax everywhere I need it.
git annex sync
and my vimrc is up-to-date, simply grand!Thanks so much for making git-annex, Daniel Wozniak
git annex copy --to bareremote
. You could run that in cron. Or, the assistant can be run as a daemon, and automatically syncs git-annex data.By default,
git annex sync
will sync to all remotes, unless you specify a remote. So, I have to specify, e.g.,git annex sync origin
. I can simplify this with aliases, I suppose, but I do a lot of teaching non-programmer scientists... so it'd be nice to be able to configure this (so beginning users don't have to keep track of as many things).Is there (or will there be) a way to do this?
Just in case you haven't considered such a scenario - maybe you have suggestions for how to collaborate more effectively with git annex (and avoid warning messages):
I'm trying to teach beginning scientist programmers (mostly graduate students), and a common scenario is to fork some scientific code. I'd like forking on github to be mundane, and not trigger warnings, and generally have as little for folks to explicitly keep track of as possible (this seems to be a common concern we share, which leads you to prefer syncing to all remotes without the option to configure the default behavior!).
However, I am currently working with students on forking and fixing up scientific code where the upstream maintainer doesn't want to allow pushes upstream, except via pull request. So, part of our approach is to set up some common shared datasets in git annex (and these just end up in our fork). If we have an "upstream" remote, git annex will try to sync with it, and report an error.
So - that's why I'd like to be able to configure the deactivation of syncing to a defined branch (e.g., "upstream"). However, if you have other suggestions to smooth the workflow, I would also like to hear those!
@Dav what kind of url does the upstream remote have? Perhaps it would be sufficient to make sync skip trying to push to git:// and http[s]:// remotes. Both are unlikely to accept pushes and in the cases where they do accept pushes it would be fine to need a manual
git push
.Anyway, you can already configure which remotes get synced with. From the man page:
So
git config remote.upstream.annex-sync=false