Recent comments posted to this site:

Thanks much, did not think to use the 'describe' for that but that's very intuitive!
Comment by https://me.yahoo.com/a/80VlVB0Bx9TaNOXIj3OCQ8eimAtIOhqjUQ--#1e80e Tue Feb 14 20:21:41 2017

@Sundar, good question.

git annex enableremote will always refuse to enable the remote if there's a missing parameter, and prompt for the parameter. Finding the right value is up to you. Most of the time, no additional parameters are needed, or the parameters are fairly self-explanatory, eg login passwords for remote services.

The difficulty with directory special remotes is that my /foo may not be the same as your /foo, so it can't reuse the directory= that was provided to initremote, and it's up to you to enter the right directory path.

I think this needs to come down to documentation in the repository. The description of the remote (set by git annex describe is a reasonable place to put that, unless you have somewhere better.

Comment by joey Tue Feb 14 18:16:52 2017

Is there a way to determine the parameters that an enableremote command must use, if one does not know it? The use case is as follows: * Dev 1 performs an initremote annexed-media directory=/path/to/media ... * Dev 1 syncs content * Dev 2 comes along (or Dev 1 comes along months later with a different machine) and clones the repo, but needs to know the directory=/path... in order to 'enableremote'. Is there any way to glean this information from the source repo itself?

The steps would be:

dev1$ git clone git@gitserver:myproject.git && cd myproject dev1$ mkdir images && touch images/foo1.png dev1$ git annex initremote annexation.dir directory=/mnt/media/myproject.annex/ encrypted=false dev1$ git commit && git push && git annex sync --content

``` dev2$ git clone git@gitserver:myproject.git && cd myproject dev2$ git annex whereis

shows something like ...

whereis images/foo1.png (7 copies) ...

38e67e39-7dfb-45e8-90fc-8c5d01aae0b4 -- annexation.dir

dev2$ git annex enableremote annexation.dir directory=??? ```

So how does the new developer know how to define the annexation.dir? Is there any way to extract from the repo itself? Or must this information be saved into the repo's documentation to avoid losing the reference?

Thanks!

Comment by https://me.yahoo.com/a/80VlVB0Bx9TaNOXIj3OCQ8eimAtIOhqjUQ--#1e80e Tue Feb 14 00:04:50 2017

I sometimes receive the following error when trying to upload files to glacier:

Traceback (most recent call last): File "/home/victor/bin/glacier", line 736, in <module> main() File "/home/victor/bin/glacier", line 732, in main App().main() File "/home/victor/bin/glacier", line 718, in main self.args.func() File "/home/victor/bin/glacier", line 500, in archive_upload file_obj=self.args.file, description=name) File "/usr/lib/python2.7/site-packages/boto/glacier/vault.py", line 178, in create_archive_from_file writer.close() File "/usr/lib/python2.7/site-packages/boto/glacier/writer.py", line 228, in close self.partitioner.flush() File "/usr/lib/python2.7/site-packages/boto/glacier/writer.py", line 79, in flush self._send_part() File "/usr/lib/python2.7/site-packages/boto/glacier/writer.py", line 75, in _send_part self.send_fn(part) File "/usr/lib/python2.7/site-packages/boto/glacier/writer.py", line 222, in _upload_part self.uploader.upload_part(self.next_part_index, part_data) File "/usr/lib/python2.7/site-packages/boto/glacier/writer.py", line 129, in upload_part content_range, part_data) File "/usr/lib/python2.7/site-packages/boto/glacier/layer1.py", line 1279, in upload_part response_headers=response_headers) File "/usr/lib/python2.7/site-packages/boto/glacier/layer1.py", line 119, in make_request raise UnexpectedHTTPResponseError(ok_responses, response) boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.) gpg: [stdout]: write error: Broken pipe gpg: DBG: deflate: iobuf_write failed gpg: [stdout]: write error: Broken pipe gpg: filter_flush failed on close: Broken pipe gpg: [stdout]: write error: Broken pipe gpg: filter_flush failed on close: Broken pipe git-annex: fd:17: hPutBuf: resource vanished (Broken pipe)

It happens only sometimes. glacier-cli can upload files without problems. The progress of the file upload is also erratic, it jumps to ~90% and then gets stuck. Can I do something to resolve this?

git-annex version: 5.20140717 build flags: Assistant Inotify DBus TDFA key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SHA256 SHA1 SHA512 SHA224 SHA384 WORM URL remote types: git gcrypt bup directory rsync web glacier ddar hook external

I am using glacier-cli from git master.

Comment by victorsavu3 Thu Feb 9 20:45:42 2017

Suppose I have two Samba fileservers in two different locations. Can I use git-annex in thin mode + git-annex assistant to automatically synchronize these two fileserver? Specifically, I am trying to understand if:

1) git-annex preserve owner/group ID and POSIX ACLs; 2) it can efficiently manage very large number of file/directory (500K+ files) 3) it can be used alongside inotify to efficiently transfer only changed files

One last thing: I which sense git-annex is not like Unison? It seems it can be configured to have very similar functionality. I am missing something?

Thank you all.

Comment by shodanshok Mon Jan 16 17:41:07 2017

(My own question, answered by Use The Source Luke method)

If you need to point a type=S3 special remote at a service which provides only https (in my case, a local CEPH RADOS gateway) then you can do it by setting port=443.

This was implemented in 6fcca2f1 and next tag was 5.20141203 . On Ubuntu, that version is available in Xenial but not Trusty.

Comment by mca Thu Jan 5 13:10:43 2017

@davidriod you can do things like this with special remotes, as long as the special remotes are not encrypted.

I don't really recommend it. With such a shared special remote R and two disconnected git repos -- call them A and B, some confusing situations can occur. For example, the only copies of some files may be on special remote R and git repo B. A knows about the copy in R, so git-annex is satisfied there is one copy of the file. But now, B can drop the content from R, which is allowed as the content is in B. A is then left unable to recover the content of the files at all, since they have been removed from R.

Better to connect the two repositories A and B, even if you do work in two separate branches. Then if a file ends up located only on B, A will be able to say where it is, and could even get it from B (if B was set up as a remote).

Comment by joey Tue Dec 13 16:43:42 2016

Thank you for this, I've always wanted such a GUI, and it's been a common user request!

Comment by joey Wed Dec 7 19:58:11 2016
I was wondering if it is possible to share a rsync special remote between repository which are not parented in any way. The use case would be that even if these repositories are not related at all they still may contains the same binary file. It would be useful to have a single rsync remote in order to reduce space usage. I think it could work as the object names are based on their checksum, but I wonder if anyone has already try that ?
Comment by davidriod Thu Nov 24 19:23:42 2016

Been using the one-liner. Despite the warning, I'm not dead yet.

There's much more to do than the one-liner.

This post offers instructions.

First simple try: slow

Was slow (estimated >600s for 189 commits).

In tmpfs: about 6 times faster

I have cloned repository into /run/user/1000/rewrite-git, which is a tmpfs mount point. (Machine has plenty of RAM.)

There I also did git annex init, git-annex found its state branches.

On second try I also did

git checkout -t remotes/origin/synced/master

So that filter-branch would clean that, too.

There, filter-branch operation finished in 90s first try, 149s second try.

.git/objects wasn't smaller.

Practicing reduction on clone

This produced no visible benefit:

time git gc --aggressive time git repack -a -d

Even cloning and retrying on clone. Oh, but I should have done git clone file:///path as said on git-filter-branch man page's section titled "CHECKLIST FOR SHRINKING A REPOSITORY"

This (as seen on https://rtyley.github.io/bfg-repo-cleaner/ ) was efficient:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

.git/objects shrunk from 148M to 58M

All this was on a clone of the repo in tmpfs.

Propagating cleaned up branches to origin

This confirmed that filter-branch did not change last tree:

git diff remotes/origin/master..master
git diff remotes/origin/synced/master synced/master

This, expectedly, was refused:

git push origin master
git push origin synced/master

On origin, I checked out the hash of current master, then on tmpfs clone

git push -f origin master
git push -f origin synced/master

Looks good.

I'm not doing the aggressive shrink now, because of the "two orders of magnitude more caution than normal filter-branch" recommended by arand.

Now what? Check if precious not broken

I'm planning to do the same operation on the other repos, then :

  • if everything seems right,
  • if git annex sync works between all those fellows
  • etc,
  • then I would perform the reflog expire, gc prune on some then all of them, etc.

Joey, does this seem okay? Any comment?

Comment by StephaneGourichon Thu Nov 24 11:27:59 2016