Recent comments posted to this site:

comment 23 70dcb7e7ffdd14351adaf4c40ee7fdd0
[[!comment Error: unsupported page format hs]]
Tue Aug 7 20:22:16 2018
comment 3 e6ce9bb92c973350852c9498b7ffb50f
[[!comment Error: unsupported page format sh]]
Tue Aug 7 20:22:16 2018

Found a remote.cache.fetch that will prevent most accidents, though of course the determined footgun script may find a way.

Comment by joey Fri Aug 3 18:09:26 2018

There are just too many ways the user could bypass such protections. Including, for example, configuring git to fetch from cache to origin/ tracking branches.

My concern is not really about making it impossible, but about making it unlikely or avoidable. It is as similar as you cannot avoid completely someone merging git-annex branch "manually" using regular git-merge with some -Sours to "avoid" the conflicts. It is unavoidable but very unlikely ;) ATM my problem is "likely" (as likely as me, the first user of the feature, ran into this problem right away) and "unavoidable" (annex merge has no option/mode to avoid merging those). As long as we could avoid it somehow (e.g. by providing some option to annex merge) in those situations, it would be great. My concern is that we cannot avoid it at all.

make it dead and use git-annex forget --drop-dead

yeap, we will add that information to some FAQ etc, very useful. But it might be a bit too late if we share that blown up git-annex branch publicly and people merge it into their git-annex'es. If someone is as advanced as configuring git with alternative fetch settings, they could indeed resort to this.

Comment by yarikoptic Fri Aug 3 17:54:01 2018

I fear that preventing merging of branches fetched from the cache remote in git-annex would be a game of whack-a-mole. There are just too many ways the user could bypass such protections. Including, for example, configuring git to fetch from cache to origin/ tracking branches.

I remember at some point discussing isolating repos from one-another so that data from one repo can't leak across a boundary to another repo, while still having it be a remote, and it was similarly just not tractable. Can't seem to find the thread, but it's basically the same problem.

If you do accidentially merge the git-annex branch from a cache remote, you can always make it dead and use git-annex forget --drop-dead.

If you really want to avoid any possibility of git fetching from the caching remote, make it a directory special remote! But, there is not currently any way to make annex.hardlink work for directory special remotes, so it will be less efficient.

Comment by joey Fri Aug 3 17:18:04 2018

The -J2 web bug was not related to caching remotes at all but was an accidental sort by remote uuid rather than cost. I've fixed it.

Comment by joey Fri Aug 3 16:30:35 2018

Well, git-annex merge does not fetch, it only merges refs it sees.

That is correct! My alias to fetch all remotes (useful to quickly update on the current state of development in feature branches of others) fetched the cache as well. Despite viral nature of git tags I consider it to be a good general approach. But fetching is not merging -- I can remove any of those remotes at any moment happen some remote became too heavy or smth like that (tags are trickier).

IMHO annex merge should also not merge those remotes which are not "pullable" by default. May be it could take remote name(s) as its argument(s) to merge only specified ones (ATM arguments seems to be silently ignored), happen someone really need to merge somehow any of those. That would prevent accidental blow up of the git-annex branch in case cache remote gets fetched.

Comment by yarikoptic Thu Aug 2 18:49:51 2018

Well, git-annex merge does not fetch, it only merges refs it sees. With the configuration I gave in the tip, you will not have a cache/git-annex branch for it to merge.

Comment by joey Thu Aug 2 18:15:13 2018

but you also have to avoid pulling from it yourself.

I think we do call out to annex merge from time to time to update information about annex objects availability from any remote it might want to do so. Since sync does more we avoid using it for those cases. git annex merge doesn't even care about any argument given to it, so we cannot simply avoid calling it on cache remotes by specifying all other remotes. Would it be possible to get some option --only-pullable or alike to make it prevent merging "caches"?

Comment by yarikoptic Thu Aug 2 17:35:22 2018

Sorry - I am still missing.
I followed your example so the cost for cache is 10, whenever for web it is default 200: [[!format sh """ $> git annex info cache web | grep -e remote: -e cost remote: cache cost: 10.0 remote: web cost: 200.0 """]] but it does download from the web in parallel download case -- so what am I missing? [[!format sh """ ~/datalad/openfmri/ds000001 > datalad get -J 1 sub-01/anat/sub-*_T1w.nii.gz get(ok): /home/yoh/datalad/openfmri/ds000001/sub-01/anat/sub-01_T1w.nii.gz (file) [from cache...]

~/datalad/openfmri/ds000001 > git annex drop sub-01/anat/sub-*_T1w.nii.gz drop sub-01/anat/sub-01_T1w.nii.gz (checking http://openneuro.s3.amazonaws.com/ds000001/ds000001_R1.1.0/uncompressed/sub001/anatomy/highres001.nii.gz?versionId=8TJ17W9WInNkQPdiQ9vS7wo8ZJ9llF80...) ok (recording state in git...)

~/datalad/openfmri/ds000001 > datalad get -J 2 sub-01/anat/sub-_T1w.nii.gz
get(ok): /home/yoh/datalad/openfmri/ds000001/sub-01/anat/sub-01_T1w.nii.gz (file) [from web...] """]] nothing in --debug output hints on the costs: [[!format sh """ ~/datalad/openfmri/ds000001 > git annex get -J 2 --debug sub-01/anat/sub-
_T1w.nii.gz [2018-08-02 13:28:03.896215705] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","sub-01/anat/sub-01_T1w.nii.gz"] [2018-08-02 13:28:03.900141316] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"] [2018-08-02 13:28:03.904139213] process done ExitSuccess [2018-08-02 13:28:03.904230988] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"] [2018-08-02 13:28:03.908376239] process done ExitSuccess [2018-08-02 13:28:03.908608977] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ff8578c5e3bdd1c67b2d9ca8082893fe6425f729","--pretty=%H","-n1"] [2018-08-02 13:28:03.913502761] process done ExitSuccess [2018-08-02 13:28:03.914221081] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"] [2018-08-02 13:28:03.914683852] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] [2018-08-02 13:28:03.920509994] read: git ["config","--null","--list"] [2018-08-02 13:28:03.925910945] process done ExitSuccess get sub-01/anat/sub-01_T1w.nii.gz [2018-08-02 13:28:03.926689119] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"] [2018-08-02 13:28:03.9274736] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objectty(from web...) ze)"] 76% 4.12 MiB 859 KiB/s 1s 73% 3.96 MiB 842 KiB/s 1s ... """]]

Comment by yarikoptic Thu Aug 2 17:28:46 2018