Improve Copy Queue logic for Moves when src/dest same

Deipotent · May 23, 2016, 8:10pm

As mentioned previously, the first check would be to see if the source and dest partition letter are same and a move is initiated, in which case it would never queue. Then it would check the ID's against the existing queue names.

Deipotent · May 25, 2016, 12:21pm

Thinking about this some more, what appears to be a simple problem is a bit more complex

Using my suggestion as an example will solve the problem of disk thrashing, but leads to inefficiencies since some queued items could run immediately once the original job (ie. the one that the queue is named after) has finished. For example,

1st job is copy to different physical disk: C: > D: (queue named "C:D:1A2B:3C4D")
2nd job (started while 1st still running) is copy to different disk E: > D: (added to queue "C:D:1A2B:3C4D" since dest matches queue name)
3rd job (started after 1st ended and 2nd job running) is copy to same disk (could be different as end result is same) C: > C: (added to queue "C:D:1A2B:3C4D" since source+dest matches queue name)

Given the above scenario, the 3rd will wait until the 2nd job has finished, but it could actually be run at the same time, since the physical disk is diferent.

The simple initial solution is to avoid disk thrashing, so I believe my suggestion is preferable to the current solution.

An ideal solution, which avoids disk thrashing, while also avoiding inefficiencies would be to allow multiple jobs in a single queue to run concurrently, if the src/dest ID's do not match. Opus could do the following when a new copy operation is initiated:

Check any existing queue names to see if it should be aded to the queue (ie. if source or dest disk ID matches either/both ID's in queue name, then it's a match).
If it finds a matching queue, the job is added to the queue, as before, but Opus will also check if the source/dest ID's of the new job match any of the currently running jobs in this queue. If a match is found, the job will be queued (ie. waiting to run). If a match is not found, the job will be run immediately (ie. so multiple jobs in the same queue could be running concurrently).
To kep maximum efficiency, one more thing needs to be done - when a job finishes, Opus should check all non-running jobs in the same queue to see if they match any of the running jobs. Any which don't match can be run immediately.

The ability to manually create new named queues complicates things further, since a job which has been added to one of these user-created queues could potentially clash with one of the auto-queued jobs. The solution is to also check any running jobs in manual queues before starting a new job running.

While all this may seem a little complicated, it's not overly complicated IMO. The main issue would be the UI that would be needed for running multiple jobs concurrently in a single queue.

However, as Opus' main job is file management, then it would be time well spent, as it would maximise copy/move efficiency when performing multiple copy/moves.

Deipotent · May 25, 2016, 2:56pm

Thinking about this still further, my previous suggestion doesn't account for if you're relying on the order of jobs. Modifying above example scenario:

1st job is copy to different physical disk: C: > D: (queue named "C:D:1A2B:3C4D")
2nd job (started while 1st still running) is copy to different disk E: > D: (added to queue "C:D:1A2B:3C4D" since dest matches queue name)
3rd job (started after 1st ended and 2nd job running) is copy to same disk (could be different as end result is same) C: > D: (added to queue "C:D:1A2B:3C4D" since source+dest matches queue name)
4th job (started after 1st ended and 2nd job running) is copy to same disk (could be different as end result is same) C: > C: (added to queue "C:D:1A2B:3C4D" since source+dest matches queue name)

Given the above scenario (and using my "ideal" solution), the 3rd will wait until the 2nd job has finished (since dest's are same), while the 4th job would start, since the src+dest are on different physical disks.

A possible solution would be to check the disks ID's of the 4th job (ie. the one Opus is considering running) against the jobs currently running in ANY queue (in this case job 2 as there is only 1 queue in the example), as well as any non-running jobs preceding it in the queue (in this case job 3). If it matches either, job 4 would not be run immediately. This could be made optional, so the user can decide whether they want queue order kept intact, where appropriate. An even more optimal solution would be to check if the paths of job 4 conflict with job 3 (eg. is job 4 src/dest folder structure a parent/child of job 3 src/dest folder structure), and always run job 4 immediately if they don't conflict.

Also, the UI issue I mentioned previously isn't really an issue, as when a 2nd/3rd etc. job from the same queue starts, it is just detached from the queue (in much the same way as when you click the "Run job immediately" button for a queued job).

In summary, Opus could add one new checkbox option, "Ensure copy queue job order where appropriate", which would solve scenario above. Opus would then use the following decision making when a copy/move is initiated:

// Following is performed when a new copy/move is initiated
//
If (Move initiated && Src+Dest letter same) {
	Run job immediately
} else {
	If (srcDiskID || dstDiskID matches existing queue name) {
		Add new job to existing queue
		If (new job srcID && destID DOES NOT MATCH ID's of any other running jobs in ANY queue) {
			// Run new job if user doesn't want to ensure copy queue order OR preceding jobs in queue don't match new job
			If (	ensureCopyQueueOrder == FALSE ||
					(new job srcID || destID DOES NOT MATCH ID's of any preceding non-running jobs in this queue)) {
				Run new job immediately
			}
		}
	} else {
		Create new queue named using concatenation of src/dest drive letters and physical disk ID's (eg. "D:E:1A2B:1A2B" means source drive is D: with physical disk ID of "1A2B" and dest is E: with ID "1A2B" - ie. D: and E: are on same physical disk in this case)
		Add new job to new queue and run job
	}

// Followng is performed when a copy/move job finished
//
- Check all non-running jobs in the queue of job that has just finished, to see if the src/dest ID's match any of the running jobs src/dest ID's
- Any jobs which don't match can be run immediately. (NOTE: if a non-running job doesn't match and is started, the subsequent checks of non-running jobs should also check against this job)

It's highly possible I still haven't thought of some scenario, but I think it should work. My brains gone to mush now thinking of the various possibilities, so time to take a break.

As mentioned previously, the above is an ideal solution, but initially I would be happy with just avoiding disk thrashing, which can be achieved by using both src AND dest disk ID's to check if it should be queued or run.

Leo · May 25, 2016, 3:28pm

And this is exactly why we keep automatic queues simple, while allowing you to explicitly queue things in situations where you need more control.

Deipotent · May 25, 2016, 4:06pm

Any chance you can change the auto queuing system to avoid disk thrashing, by checking both source and dest disk ID's ?

Jon · June 15, 2016, 10:35pm

You can use the new OnGetCopyQueueName script event in Opus 12 Beta 7 to improve (complicate) this as much as you like. Have fun

Deipotent · June 16, 2016, 11:03am

I would still like to see the auto queue system check both source and dest disk ID's when determining whether to queue or not, so it avoids disk thrashing by default. The current auto-queue logic does not avoid thrashing in all cases.

Deipotent · July 20, 2016, 12:42pm

Can this make it into Opus 12 (or the subsequent point release) ?

I know the queue can be configured, but one of the main reasons for a copy queue was to avoid disk thrashing and the default logic does not accomplish this.

The fix is relatively easy: use both source and dest diskID's in queue name and also when checking if an operation should be queued (ie. if either the source or dest disk ID appear in the queue name). This would avoid all disk thrashing with the default queue logic, which is what most people expect the queue to achieve.