Copying sparse files

Any update on the ability to copy sparse files?

In an old thread Leo said they were "esoteric", but actually they are very very common now because most of the popular Bittorrent clients default to using them.

Copying sparse files should work. The copies may not be sparse but they'll have the same data.

What's the actual question/context that you want an update on?

Sorry, that's what I meant. You can copy them, but they expand and lose their sparseness.

It's a bit of a pain when you have large files that only take up a few kilobytes on disk but when you try to move them somewhere else they suddenly expand to a few gigs.

Is there any prospect of being able to copy sparse files and have the copies be sparse too? It's possible, I had to muck about with some scripts this app: http://www.flexhex.com/docs/articles/sparse-files.phtml

Your use-case is BitTorrent? The files will only be sparse until they finish downloading, and copying a file while it's only half-downloaded seems strange. (BitTorrent clients have the ability to move the files built-in, if you need to move the download to another place while it's still ongoing.)

It's actually quite common. A torrent file will contain multiple files, and the client needs to download parts of them to cover the start and end of the ones you want.

Another frequent use case is hard drives for virtual machines. People move them around and duplicate often.

As SSDs become common sparse files no longer have any disadvantages, so are getting more common too. I use them in commercial software to avoid needing to build a duplicate index beyond what the filesystem already offers.

Sorry but that still doesn't make sense to me. Why are you copying half-downloaded torrent data around? For what purpose? What would you do with the half-downloaded files you have copied to somewhere else?

Which VMs are we talking about? Are you sure the one you are using creates NTFS sparse files and isn't just calling something else "sparse"? e.g. VMware has an option to pre-allocate space (best for performance, but uses more room if the disks aren't full), and another option to create "sparse" disks which grow as needed, but I'm not sure the sparse option uses the NTFS sparse file feature. (Last I checked, I don't think it did, since the file sizes grow with the disk, which would not happen with a sparse file.)

Is the VM argument something you are actually using, or just a hypothetical?

If you remember, Leo, I asked about sparse files a few weeks ago. I use a different torrent client, and the completed file is marked with the sparse flag on my hard drive when it finishes downloading. I had to come up with a way to remove that flag before I could mount it, if it was a .iso file, for example.

As I also discovered, one of the solutions was to just copy it to another location, and then the sparse flag was removed. So, it's not just with uTorrent, BitTorrent, and other clients that this happens now days...

1 Like

That's exactly the opposite of what Mojo-Chan is asking for, and already how Opus* works. (* And almost anything else that copies files.)

You want the files to not be sparse (to work around the bug in the Windows ISO mounting code where it fails with sparse files for some reason), and copying them in Opus is one way to achieve that. Mojo-Chan is asking for the files to remain sparse after copying.

Right, I realize that they lose the sparse flag when Opus copies them anywhere. The point I was trying to make was to address the statement above, where from what I understood, you said that the files would lose their sparse flag when they finished downloading... Perhaps I did not read that correctly?

For all files that are not mountable, I can see where retaining the sparse flag would be a nice feature to have.

1 Like

Once downloaded, the files may still be flagged as "sparse" but all the data will be written into them, with no blank parts, so you gain literally nothing from them still being flagged as sparse and they're essentially the same as a normal file (just with a flag set saying "there may be sparse data in this file, even though there actually isn't anymore"). It's only useful to the bittorrent client while it is downloading the files, not to anyone or anything else that might use the files once they're downloaded.

Why? What does it gain you?

We are talking about something that would impose an extra time/speed overhead on every file you copy, whether sparse or not (it takes time to check, similar to the time it takes to copy timestamps, attributes and other metadata, which can be very significant when copying lots of small files). It could introduce bugs into the core file copy code and would require extensive testing. It would run into problems with software and filesystems that don't implement sparse files correctly, as we found with the non-buffered I/O mode we added some years ago (and as you've found with the Windows ISO mounting software!). There would have to be a very good reason to implement this, and no one has put one forward so far.

OK. I'm out of my depth here, so I'll take your word on it..

Back to /lurk mode......

Didn't really come here to argue about the validity of my use-cases, just wanted to request the feature.... But since you asked.

These days many popular torrent clients use libtorrent, or at least behave the same way. The default is to create sparse files when downloading. Often the user doesn't want to download every file in the torrent, but the torrent client still has to fetch parts of those unwanted files because of the way BT works.

BT works on blocks. A block can contain parts of more than one file. So in almost every instance the start and end of a file will be in a block that is shared with another file. Meaning that even after the download completes there will be unwanted files (usually hidden in a .unwanted directory or similar) as well as the wanted ones.

Those unwanted files are sparse. It's a good thing too, because otherwise you would end up wasting gigabytes of disk space on them. But the moment you copy them around they suddenly expand.

Other common use cases are moving your in-progress downloads to a new location on the same system.

As for VMs, for example VMWare supports using sparse hard disk image files.

Finally, supporting this is rather easy. I already added a link to an example implementation (which I tested). Basically when you see a sparse file you use FSCTL_QUERY_ALLOCATED_RANGES.

If you don't want the files and didn't download them I don't understand why you're copying them around instead of deleting them.

Basically, from our end changing the copy code is complex and fraught with peril and we want to understand WHY a change would be useful or necessary before embarking on it.

The unwanted files do get deleted eventually, it's just you have to go in and manually clean stuff out from multiple directory trees and if you forget or miss one it's a multi-gigabyte bomb waiting to go off. Sometimes you don't notice immediately that the copy expanded either, so you end up wasting loads of space indefinitely.

It's not impossible to work around, but it's hard work and from the user's perspective it's not the expected behaviour for files being copies/moved to suddenly expand to take up 1000x more disk space.

I have no idea how many people would benefit from this, I'm just saying that I personally do this fairly frequently and more and more software is starting to use sparse files now that SSDs are common. I move VMs around a lot, for example, both at home and at work using that little tool I linked to.

It's got to the point where it might be worth me investing time in developing it into a more capable tool, but that would just be duplicating a lot of functionality that Opus already has.

Is there some middle ground, like doing it as a plugin or something? Opus can handle stuff like building the directory tree, various error conditions, respecting other user preferences like attribute copying etc. and just have some plug-in code to do the actual copy? The copy code is already written and not particularly complex anyway.

Thanks for explaining further!

We'll put it on the list to look at. Of course any changes would only be to the current version, not Opus 11.

In Opus 12.8.1 beta, we'll add this:

  • "Preferences / File Operations / Copy Attributes / Copy sparse files as sparse" option.
  • COPYSPARSE argument for the Copy command to override it for specific actions.

Thank you very much! That's excellent, I look forward to testing it.

I'll recommend the guys at work who deal with VMs a lot buy a license too, because they are fed up with that tool I linked to.

This is in the 12.8.1 beta which was released yesterday.

1 Like

Thanks Jon, I'll test it out over the next few days. I've been away in China with internet restrictions so my apologies for teh slow reply.