General optimizations that would be nice in Opus

While Opus makes use of multiple threads for off-loading tasks into a separate thread, it does not always use multiple threads for performing a a specific task (eg. multi-threaded: directory scanning, folder deletion, comparison (for Sync tool), FTP upload/download (including multi-part), etc.). I'd like to see some optimization of some of these core file-handling tasks, to better multi-thread the actual task being performed, along with using the MFT directly (like Everything) where possible. Even though SSD's have sped up file/folder operations a lot, they can be sped up more to maximise throughput of of modern systems. I know some of these have been mentioned before, but thought I'd group them together in a single place for reference.

  1. Use MFT directly (like Everything) for file/directory enumeration, with optional database, and monitoring changes. This is fine for getting basic details like file/folder names, and basic attributes, like size, date created/modified, etc. and would make lister population very fast, while any further attributes/info can be run in background thread(s), if needed, as is already done. Some references:

  2. Use multi-threaded folder deletion/scanning. An open-source tool called ByeNow deleted a large folder containing lots of sub-folders and files about 3x faster than Opus (with a cold file cache), on a SSD based modern laptop. Two main options are Breadth-First Search and Depth-First Search. I believe BFS is the better option in this scenario. While external tools can be used, it would be nice if this core feature could be improved in DOpus, to also get the nice progress dialog.

  3. Multi-threaded/Multi-part FTP/SFTP upload/download (lots of posts about this) - apart from basic needs, a lot of people just end up using something like Filezilla, which is a shame as it makes the Opus built-in FTP/SFTP feature a bit redundant and, me personally (and I'm sure other people), would prefer to stay in Opus.

  4. But more specific, but cache the file extensions to display in the File Filter Field toolbar item, as it seems to re-read all the files when the drop-down of the toolbar item is clicked, leading to a delay before the drop-down menu is shown. This is very noticeable when FlatView is used where the list contains a lot of items (I noticed when there were over a million items in the flatview).

  5. Make better use of caching in general, where possible, and within memory contraints.

I'll update this list with other ideas (mine or other people's), as and when they occur.

Reserved

Reserved 2

Using multiple threads for local filesystem operations like file deletion will usually slow things down, not speed them up. At least, that was generally true with devices and filesystems in the past.

If you have a proof-of-concept examples and benchmarks to back up the idea, and aren't just assuming it would be better without testing, then we'll take a look at them.

On a relatively large folder tree, ByeNow was much faster at deletion in my testing, with a cold file cache (ie. rebooted between tests). Not sure whether it's only the scanning part that multi-threaded or also the deletion Both scanning and deletion are multi-threaded (Ref. by author), but it was definitely a lot faster than DOpus.

Although I mentioned in my original post this was on an SSD based system, it might've been on an external USB HDD, as I can't find my original notes. I'll confirm results on both internal SSD and external USB HDD and USB SSD, and post back here.

Another useful free tool which can be used to analyse sequential read/write speed's using various buffer configurations is CCSIOBench, which also includes a detailed article on Fast Bulk I/O

Further information on multi-threaded file system operations can be found on the Bvckup development blog/wip.

The development blog is a treasure trove of information related to optimizing file system operations (eg. asynchronous parallel disk scanner that dynamically rearranges its scanning queue for TreeView)

Test Tool for multi-threaded directory scanning so you can see for yourself, along with other people's results on linked forum thread.

CCSIOBench looks like it's about reading/writing file data, not about reading directories or deleting files. Doesn't look relevant from a quick look?

The blog post: I'm not sure what "backup processing" entails but I suspect it's a lot more than just reading the directories. Processing System32 + subfolders, as in their example, would not take 5-17 seconds on even a slow HDD if it was just reading the directory listings, so they must be talking about something more than that. They say the parallel method reduced it to just over 5 seconds, but reading the whole directory the normal way (the one the Windows Properties dialog uses) only takes a couple of seconds here as it is.

The test tool: Looks relevant to dir reading (not deletion) but the speed differences people report seem very small, and vary a lot from run to run as well as system to sysem.

It would need to be a lot faster to justify the added complexity, which will include working out when not to do it because it will be slower on many device types, and it will cause problems for some people, for a marginal improvement when it does work better. This is in addition to the complexity of error handling and the error reporting/retry/skip UI if it's doing things in parallel.

Even if it's slightly faster for an NVME SSD, if it was done all the time, it is likely to only speed things up that are already extremely fast (NVME SSD directory read times are already almost instant) and could also make slow things even slower (physical HDDs will not be faster reading uncached directories in parallel, I'm fairly certain; network drives probably depend on a lot of factors which makes things even more complicated). Faster slightly faster, and slower slower, would not be a good trade-off at all.

We've been down this kind of road before* and wasted weeks on what turned out to be an interesting idea in theory that in reality made very little difference and actually caused problems for some people with some device types, ultimately being turned off by default because it was causing more problems than it helped with.

(* In fact, many years ago I suggested parallel copying for small files, and Jon told me to prove it, so I wrote the code and... it wasn't faster at all, heh. But there's also the non-buffered I/O stuff we added to Opus a few years after that, which ended up being causing problems for very little gain in the end, and is now off by default.)

FWIW, we have some other things planned that will speed some of this up, but we aren't ready to talk about those things yet.

3 Likes

The main tool the developer is selling is a file synchronization tool, but is highly optimized for all parts of it, and some of the blog posts (not just the ones I specifically linked to) explain details about how they've experimented, along with their result's.

BTW, it was DFS that is faster in this scenario, not BFS (BFS is possibly better for searching).

I'm not saying Opus needs to squeeze every last drop of performance from it, but in my experiences with Opus certain operations could be optimized. Another example is the Opus Sync tool, where the initial read of D:\ SSD system partition (18,649 folders, 216,756 files, as reported by Opus) isn't too bad, but still slower than multi-threaded (on warm cache)...

  • 0.5 secs for blog parallel scanning test tool using defaults with 8 threads
  • 4 secs for Opus Sync tool (I realise it may be doing extra things) during initial read when file/folder count stops increasing (and 9 secs until the Comparing files progress dialog is displayed).

CPU usage is about 50% during read with test tool, but only around 12% with Opus.

The worst part though is the Comparing stage in Opus, which is painfully slow, even when only comparing with Size and the destination is EMPTY (apart from "System Volume Information" folder) - Empty NTFS formatted RAM disk created with SoftPerfect RAMDisk (Opus CPU usage around 12% again)...

-36 mins 46 secs in Opus (It takes 6m 21s to get to 50%)

  • 8 secs in Beyond Compare Folder Sync (without actual Sync)

The CCSIOBench tool was more for optimizing the copy buffer size, as it automatically tests lots of combinations of things like copy buffer size, and other things. Opus users could maybe use this tool on their systems and post results which could be used to optimize the defaults in Opus, or help to allow users to customise.

The ByeNow tool I posted is more about parallel scanning of dirs and parallel deletion, and in my original test, it was approx. 3 x faster than Opus when deleting a relatively large folder containing lots of files/sub-folders (from memory it was something like 30 seconds compared to maybe 80 or 90 sec in Opus, which is a decent speed increase - I will test again with both cold and warm cache and post here with more specifics).

1 Like

This would be better if we stuck to a single operation, as what bottlenecks one operation won't be the same as what does another.

The sync tool in Opus spends most of its time building and comparing file lists in memory and adding them to the file display. (Something I believe we're going to improve soon, although it's not work I'm involved in myself, so my knowledge is limited..) It isn't disk/filesystem bound, so the assumption that multithreading the directory reading part would be a big win is most likely incorrect.

Deleting files, which I think is what we started talking about will be very different to the snyc tool. (Also keep in mind we have no control over the usual case of deleting to the recycle bin, which Windows handles).

Hi there!

I hope this is the right place to post my suggestion.

I have to copy a lot of files on a daily basis, and every time i start a copy job,
the progress-dialog pops up in the middle of the screen.
I then have to shove this progress-dialog away to eg. the right of the DOpus-window so i can see the new files i have to handle.

This isn't a big deal, but if i have to do this 20+ times a day, it becomes somewhat tedious,
especially if the window only lasts for about a minute, and i am to slow to queue the next copy job in time.

It would be nice to have some sort of fixed position where this transfer-window pops up.
(Either set a position in the settings or have some sort of "Pin" button to fix it in my desired place).

That's all, thanks for reading. :slight_smile:

This thread is not the right place to post that, no. I already moved your previous post of the same thing into a new thread:

Currently really busy with other things which is reason I haven't posted info related to this. I will get around to it though.

I agree with not confusing which optimization's relate to which operation, and will create a new thread for each specific operation for future reference, when I have more detailed information/benchmark results even if they don't show much of an advantage. I'll still post in this thread with general information, as a place to collate general notes/idea's, some of which may end up in the other specific threads.

For disk related operations (eg. like directory traversal), I think SSD's would probably benefit most (NVMe, in particular), as is borne out by SSD benchmarks where the best speeds and IOPS generally occur with a higher queue depth (ie. multiple operations in-flight instead of just one-at-a-time). Even with HDD's though some of my basic tests have shown some improvements (more later when I've done more thorough testing).

Great to hear work on improving the Sync tool speed (specifically compare part, but also build file list part). Not sure if there is a bug with Compare part, or just inaccurate progress update, as progress increases faster at start, but seems to slow down the further along it gets.

I understand you probably won't be able to improve delete to recycle bin much, but I'm mainly interested in improving performance of actual permanent delete, which you can control.

Couple of related articles related to fast folder deletion - first one has some benchmarks of each method at end