Recent comments posted to this site:
Found a remote.cache.fetch that will prevent most accidents, though of course the determined footgun script may find a way.
There are just too many ways the user could bypass such protections. Including, for example, configuring git to fetch from cache to origin/ tracking branches.
My concern is not really about making it impossible, but about making it unlikely or avoidable. It is as similar as you cannot avoid completely someone merging git-annex
branch "manually" using regular git-merge with some -Sours to "avoid" the conflicts. It is unavoidable but very unlikely ;) ATM my problem is "likely" (as likely as me, the first user of the feature, ran into this problem right away) and "unavoidable" (annex merge
has no option/mode to avoid merging those). As long as we could avoid it somehow (e.g. by providing some option to annex merge
) in those situations, it would be great. My concern is that we cannot avoid it at all.
make it dead and use
git-annex forget --drop-dead
yeap, we will add that information to some FAQ etc, very useful. But it might be a bit too late if we share that blown up git-annex branch publicly and people merge it into their git-annex'es. If someone is as advanced as configuring git with alternative fetch settings, they could indeed resort to this.
I fear that preventing merging of branches fetched from the cache remote in git-annex would be a game of whack-a-mole. There are just too many ways the user could bypass such protections. Including, for example, configuring git to fetch from cache to origin/ tracking branches.
I remember at some point discussing isolating repos from one-another so that data from one repo can't leak across a boundary to another repo, while still having it be a remote, and it was similarly just not tractable. Can't seem to find the thread, but it's basically the same problem.
If you do accidentially merge the git-annex branch from a cache remote, you can always make it dead and use git-annex forget --drop-dead.
If you really want to avoid any possibility of git fetching from the caching remote, make it a directory special remote! But, there is not currently any way to make annex.hardlink work for directory special remotes, so it will be less efficient.
The -J2 web bug was not related to caching remotes at all but was an accidental sort by remote uuid rather than cost. I've fixed it.
Well, git-annex merge does not fetch, it only merges refs it sees.
That is correct! My alias to fetch all remotes (useful to quickly update on the current state of development in feature branches of others) fetched the cache as well. Despite viral nature of git tags I consider it to be a good general approach. But fetching is not merging -- I can remove any of those remotes at any moment happen some remote became too heavy or smth like that (tags are trickier).
IMHO annex merge
should also not merge those remotes which are not "pullable" by default. May be it could take remote name(s) as its argument(s) to merge only specified ones (ATM arguments seems to be silently ignored), happen someone really need to merge somehow any of those. That would prevent accidental blow up of the git-annex branch in case cache remote gets fetched.
Well, git-annex merge does not fetch, it only merges refs it sees. With the configuration I gave in the tip, you will not have a cache/git-annex branch for it to merge.
but you also have to avoid pulling from it yourself.
I think we do call out to annex merge
from time to time to update information about annex objects availability from any remote it might want to do so. Since sync
does more we avoid using it for those cases. git annex merge
doesn't even care about any argument given to it, so we cannot simply avoid calling it on cache
remotes by specifying all other remotes. Would it be possible to get some option --only-pullable
or alike to make it prevent merging "caches"?
Sorry - I am still missing.
I followed your example so the cost for cache is 10, whenever for web it is default 200:
[[!format sh """
$> git annex info cache web | grep -e remote: -e cost
remote: cache
cost: 10.0
remote: web
cost: 200.0
"""]]
but it does download from the web in parallel download case -- so what am I missing?
[[!format sh """
~/datalad/openfmri/ds000001 > datalad get -J 1 sub-01/anat/sub-*_T1w.nii.gz
get(ok): /home/yoh/datalad/openfmri/ds000001/sub-01/anat/sub-01_T1w.nii.gz (file) [from cache...]
~/datalad/openfmri/ds000001 > git annex drop sub-01/anat/sub-*_T1w.nii.gz drop sub-01/anat/sub-01_T1w.nii.gz (checking http://openneuro.s3.amazonaws.com/ds000001/ds000001_R1.1.0/uncompressed/sub001/anatomy/highres001.nii.gz?versionId=8TJ17W9WInNkQPdiQ9vS7wo8ZJ9llF80...) ok (recording state in git...)
~/datalad/openfmri/ds000001 > datalad get -J 2 sub-01/anat/sub-_T1w.nii.gz
get(ok): /home/yoh/datalad/openfmri/ds000001/sub-01/anat/sub-01_T1w.nii.gz (file) [from web...]
"""]]
nothing in --debug output hints on the costs:
[[!format sh """
~/datalad/openfmri/ds000001 > git annex get -J 2 --debug sub-01/anat/sub-_T1w.nii.gz
[2018-08-02 13:28:03.896215705] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","sub-01/anat/sub-01_T1w.nii.gz"]
[2018-08-02 13:28:03.900141316] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2018-08-02 13:28:03.904139213] process done ExitSuccess
[2018-08-02 13:28:03.904230988] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2018-08-02 13:28:03.908376239] process done ExitSuccess
[2018-08-02 13:28:03.908608977] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ff8578c5e3bdd1c67b2d9ca8082893fe6425f729","--pretty=%H","-n1"]
[2018-08-02 13:28:03.913502761] process done ExitSuccess
[2018-08-02 13:28:03.914221081] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2018-08-02 13:28:03.914683852] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2018-08-02 13:28:03.920509994] read: git ["config","--null","--list"]
[2018-08-02 13:28:03.925910945] process done ExitSuccess
get sub-01/anat/sub-01_T1w.nii.gz
[2018-08-02 13:28:03.926689119] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2018-08-02 13:28:03.9274736] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objectty(from web...) ze)"]
76% 4.12 MiB 859 KiB/s 1s
73% 3.96 MiB 842 KiB/s 1s
...
"""]]