Feature Request: Join function enhancement

Posting here for some feedback to see if others might find this request useful...

Occasionally, I find myself in need of joining split file parts back into a single and VERY large output file (10's of GB's). In such cases, I may not have enough disk space to accommodate BOTH the split files AND the rejoined target file, even though I may have a respectable overall amount of disk space (+20 GB).

In such cases, it would be super helpful if there were a Join function argument that could cause the join function to behave in the following ways:

1.) Rather than starting to write the joined files to an entirely new output file, start appending the multiple file parts directly to the first file in the join list. So rather than File1 -> New Output File <- File2... Just begin by appending File2 directly to File1, and so on. I leave it to discussion to say whether or not the first file in the list should simply be "renamed" to whatever the user specifies as the target output filename or not...

2.) All the above does on its own is save the disk thrashing and additional space consumption of copying the first file part into a new output file. However, even if that is NOT done... the behavioral change that would really be most advantageous in my scenario is if Opus had a function arg and Join dialog checkbox to delete each file in the list "as it completes being joined to the target output file". This is different than what happens if you just stick a delete command at the end of a button that does the raw join command... where it just batch deletes all the file parts in one go at the end of the join being completed.

For my particular scenario, the first behavior above doesn't really help me all on it's own withOUT the second... though if a user intends to delete the file parts after the join, even that first behavior above saves a bit of transient space usage as well as disk thrashing and time to complete. I.e. it would still provide some value for some users if - say - you only had two file parts, where the first is very large but the second is small in relation. In this case, the total time to complete the join would be much better than currently is the case.

At any rate - thoughts? Simply adding more disk space just isn't always a practical way around the dilemma I find myself in.

Seems like an okay idea to me, although we'd have to be very careful with error handling to make sure a failure part-way through left you in a state you could still join everything up from, since some of the source files would no longer be there.

What do we do at the moment if the destination runs out of disk-space when joining? (It's not easy for me to check right now, and I figure you already know. :slight_smile:)

If we show an error message an offer to "retry" from the point that has been reached, and only delete the joined file if the operation is aborted, then I guess you could already do this manually by deleting the files before the one the progress dialog says is being joined at the moment. Whether that's good enough depends on how often you run into this scenario, of course.

(OTOH, if we delete the target as soon as there's an error, and retry starts from the beginning again, then you could delete the parts manually, as each one is used up, but it would be risky as if you got distracted and ran out of disk space then you'd lose everything. i.e. You want that "disk full" error to be a safety net, at least.)

Can you:

  • open Part 1 for Append, saving current byte count (i.e. end of file location)
  • append Part 2 to Part 1
    if success (no write errors), close/flush File 1, Delete File 2, move on to next part
    if failure, reset/truncate Part 1's end of file location to remembered size and close/flush part 1. Abort

That's the kind of thing I had in mind.

(It could still go wrong with weird situations, e.g. anti-virus deciding what you just appended to the file makes it suspicious, and then locks all further attempts to modify the target file, even just to truncate it, but that's acceptable.)

[quote="leo"]
(It could still go wrong with weird situations, e.g. anti-virus deciding what you just appended to the file makes it suspicious, and then locks all further attempts to modify the target file, even just to truncate it, but that's acceptable.)[/quote]

Bleech. I didn't think about that, being an old-school *nix dev simpleton from a time less malicious, where writes were writes, and viruses were strictly in the domain of living creatures. I do not envy your domain.

Sorry in advance for such a lengthy reply:

Agreed - though I'd offer that since the "ask" is that you'd be deleting file parts AS they complete joining, I don't think it's be wise to ever auto-delete the joined file under any circumstances... since that (perhaps 'partially)' joined file will be the only place you have any copy of the file parts that have been completed and deleted up until that point (say you fill up the disk 7 parts into 10). More on this below, though I think you're thinking here of ~other errors than just running out of disk space (since a goal of this request is to actually help prevent running out of disk space).

I'm actually not sure - I knew ahead of time I wouldn't have the disk space needed to do the join, so didn't bother. I can grab a USB key with not much space on it and let you know though :slight_smile:.

[quote="leo"]If we show an error message an offer to "retry" from the point that has been reached, and only delete the joined file if the operation is aborted, then I guess you could already do this manually by deleting the files before the one the progress dialog says is being joined at the moment. Whether that's good enough depends on how often you run into this scenario, of course.

(OTOH, if we delete the target as soon as there's an error, and retry starts from the beginning again, then you could delete the parts manually, as each one is used up, but it would be risky as if you got distracted and ran out of disk space then you'd lose everything. i.e. You want that "disk full" error to be a safety net, at least.)[/quote]
Well, I had thought about manually cleaning up the file parts as the join progressed - but to be honest, in my ~current scenario it's just not that practical. We're talking about such a huge output file that it's going to take QUITE a while to complete - a few hours maybe. I can't be certain I'd be able to monitor the progress diligently enough, manually deleting the completed file parts to get it to the point where I could then walk away from it once it's gotten far enough along that I wouldn't run out of space.

Really, If I had my way, I'd run such a join at the end of the day before calling it quits for the night and hope to see an error-free joined file sitting in the dir when I woke up the next morning :slight_smile:.

Realizing that you're concern over solid error handling was likely based on things 'other' than simply running out of space - still, I was also thinking that perhaps Opus could do a basic check for the target folder free disk space - say, before each file part is processed. And with that, do some assessment of a likely disk space full condition as a result of the join. It wouldn't hurt that much to do so for every join operation, but particularly if an option to delete file parts as they're joined were provided - perhaps you could consider a user actually making use of such an option as a potential indication of concern about disk space; and only then do some free space checks? It wouldn't be foolproof, as ~other tasks happening on the destination storage could consume disk space at the same time as Opus is joining files... but it's better than beginning the joining of a file part when it's obvious that it will end up failing from the onset. I'm less concerned about 'actually' running out of space though - as the sort of enhancement I'm requesting is to free up space as the join progresses so that I specifically DON'T run out of space. Without something "else" happening on the system that would be consuming disk space while a big join is in progress, this request would mitigate the need for free disk space from currently being the same as the total size of all the files to-be-joined, down to just 1 or 2 times the size of the largest file fragment being joined...

Good point about manual deletion not really working with such huge, slow-to-join files. I overlooked that.

Checking space might make sense, although it's unfortunately difficult to accurately work out how much space there really is in a given directory (even on a drive that isn't in use by other things and changing underneath you, without considering junctions, quotas, clusters, metadata overheads, etc.), so we typically stay away from making decisions based on free-space. (The join could still fail for other reasons of course, as you say.)

So, yeah, it seems like a good idea.