Recent comments posted to this site:

@pat: Sorry, I don't have such performance information for git-annex as I stopped using it ~2 years ago.
Comment by CandyAngel Fri Nov 19 17:43:35 2021

I have opened a bug for that issue, borg special remote memory usage high for large borg repo.

It would be good if you could followup there with details.

Comment by joey Fri Oct 8 15:04:42 2021
This worked like a charm.
Comment by bjornw Fri Oct 8 15:04:42 2021

It's already possible, and in fact very easy to support resuming downloads. When the filename that you're asked to download to already exists, you can simply check its size and resume downloading to it where the previous download left off.

When you send PROGRESS, the value should be the same as the current size of the file. That is always the case really, but the distinction between file size and amount you've downloaded only matters when resuming.

Another way to support resuming, when talking to an API that does not, is to encourage users of your remote to configure chunking with a small enough chunk size. git-annex will then handle resuming by re-starting on the last incomplete chunk. In this case, you'll be downloading each chunk to a separate file, so you will not need to do anything to support resuming.

If a remote does any kind of out of order downloading (like bittorrent does), it needs to avoid writing to the file out of order, with holes in the middle of it. Such holes would mess up a resume of the download of the same object by another remote.

Comment by joey Fri Oct 8 15:04:42 2021
Thanks, that works great!
Comment by alex Fri Oct 8 15:04:42 2021

When I tried running git annex sync borg on a large (~6T) borg repo with many archives, git-annex spun until it used 52G of memory, then got OOM-killed.

I don't know if this is a memory leak or just trying to load too much, but it seems like this is a thing you should be able to do on a machine with 64G of RAM.

Comment by tomdhunt Mon Oct 4 14:16:14 2021

It would be helpful to allow special remotes to take advantage of git annex's ability to resume interrupted downloads for large files, especially on slow/unreliable connections. One way to implement this would be to allow the special remote to send a message asking git-annex what offset it intends to read at, then write a sparse file with only the needed data. I notice the testremote suite includes tests for resuming downloads at an offset, so it is possible no other changes would be needed.

Sparse files could be avoided by allowing the special remote to send a command indicating the offset at which the target file starts.

Does that sound like a reasonable design?

Comment by alex Mon Oct 4 14:16:14 2021

Running extract on very large files (system backups) can be too long (killed it after running several hours). In general extract seem slow on tar.gz archives.

I added timeout 100s before the tool is called in the pre commit script:

LC_ALL=C timeout 100s $tool_exec "./$f" | ...

This allows to have the commit to complete in reasonable time, probably loosing some metadata.

Comment by aurelf Fri Sep 10 16:47:39 2021

You could add a config to the script that skips over files larger than a certian size.

Or for that matter, the script could be adapted to filter the files to only include images/videos, using eg:

git annex find --mimetype='image/*' --or --mimetype='video/*'

Should be a fairly easy change, patches accepted.

Comment by joey Fri Sep 10 16:47:39 2021
Following the instructions here, I cannot enable the remote. The error message is: git-annex: Unknown remote name.. I assume this is because git annex does not create a uuid for the type=git special remote, presumably because non is set for the actual git remote (the annex-uuid key does not exist for the existing git remote with the same url). This is the relevant line generated in remote.log: autoenable=true location=<ssh-url> name=<name> type=git timestamp=1629118438.628919s, as you can see there is no uuid at the beginning. Any ideas if this is a bug or if the instructions are outdated?
Comment by matthias.risze Mon Aug 30 19:02:09 2021