Opus freezes during large USB file transfers

I really like the idea of having a threshold for two different copy strategies. But I propose to let the default value (if not overriden by the user) depends on the physically installed system memory.

I like the idea too, but I would prefer a pre-defined threshold of e.g. 128 MB that can be overriden by the user.

I do not like these magic self-adjusting settings.

Would buffering make any difference on small files, either way?

[ul][li]If it does:

(e.g. Is allow several new small file to be written into the write-cache before the old one has finished being written to disk.)

It may make sense to disable it for small files as well, if the aim is to make the progress bar accurate on slow USB drives with system large buffers. Of course, this may slow things down, but you may not be able to have maximum speed and unbuffered writes with completely accurate progress bars.


And what happens if you have a mixture of large files and lots of small files? If 10,000 small files hit the systems write buffer, is that any different to one large file hitting it? Will performance be degraded if those 10,000 small files are in the write buffer while an unbuffered large file is being copied to the drive at the same time, competing for the resource?

[/li]
[li]If it doesn't:

Why bother with the extra logic at all? Just do everything unbuffered if a slow device is detected.[/li][/ul]

Are there any good benchmarks to prove the behaviour with small files, one way or the other?

Or to prove that the amount of system memory plays that big a factor in this scenario? (I have 16GB in this machine, but have also used ones with 1GB and 2GB recently.)

I think there is a lot of guessing going on in this thread (including my own guessing), and these theories need testing properly before we implement any changes.

What I have observed is that writing thousands of small files unbuffered leads to an additional effort because the master-file-table (MFT) has to be updated for each file. On the other hand, if these files are written buffered there are bulk-updates made to the MFT which performs much better but the system cache may block processes due to the long flush duration.

I have a solution in my mind that is based on a configurable threshold that plays a double-role:

  1. The threshold specifies for a single file which copy mode should be used (buffered for smaller or unbuffered for larger files).

  2. If only smaller files are copies (but many of them) the file sizes are summed and if the threshold is reached DOpus will forcly flushes the write-cache.

This should even work very well for mixed-sizes filesets.

Going this way provides a linear progress when copying large files without filling the system cache with unwanted data and on the other hand thousands of small files are copied very well since the MFT can be updated in a bulk. Additionally the extra flush (when the threshold is reached) prevents the system caches from growing to its maximum.

Do you have any benchmarks of documentation to back up those theories? They sound plausible, but I've found this stuff often doesn't work the way you might assume.

(For example, I once spent a weekend writing a "fast copy" proof-of-concept which copied lots of small files in parallel instead of in series, assuming the per-file overhead was causing delays. It turned out to be no faster than doing things normally, because the filesystem was serialising the file creations anyway. Until ideas like that are tested, they're just ideas. Sometimes ideas to speed things up actually slow things down, too.)

Sure, I made several tests on my own machine as well as implemented such stuff for our projects.

I performed a quick test on my own machine for you by copying files in the console and access the destination device in parallel with DOpus (only in the first run). You can easily perform these tests on your own machine by using the Power-Shell. I have repeated each test 20 times and dumped the fastest run.

MANY SMALL FILES TEST (BUFFERED + UNBUFFERED)

Source:
D:\Prj\ (contains 12.840 files, total-size of 987 MB)
2x Intel SSD (intern / RAID-0)

Target:
T:\Test
1x WD HDD (intern)

PS D:> Measure-Command {xcopy.exe D:\Prj*.* T:\Test\ /e}

BUFFERED: System lags a little and Perfmon shows caches growing. DOpus lags heavily, browsing the destination in parallel is only possible with short freezes (2-3 seconds).

Days : 0
Hours : 0
Minutes : 0
Seconds : 16
Milliseconds : 485
Ticks : 164852579
TotalDays : 0,000190801596064815
TotalHours : 0,00457923830555556
TotalMinutes : 0,274754298333333
TotalSeconds : 16,4852579
TotalMilliseconds : 16485,2579

PS D:> Measure-Command {xcopy.exe D:\Prj*.* T:\Test\ /e /j}

UNBUFFERED: System runs smooth and Perform shows constant caches. DOpus lags a little bit but browsing the destination in parallel is possible (near instant).

Days : 0
Hours : 0
Minutes : 3
Seconds : 28
Milliseconds : 757
Ticks : 2087572198
TotalDays : 0,00241617152546296
TotalHours : 0,0579881166111111
TotalMinutes : 3,47928699666667
TotalSeconds : 208,7572198
TotalMilliseconds : 208757,2198

SINGLE LARGE FILE TEST (BUFFERED + UNBUFFERED)

Source:
D:\Backup\System-0.tib (single file; 5.2 GB)
2x Intel SSD (intern / RAID-0)

Target:
T:\Test
1x WD HDD (intern)

PS D:> Measure-Command {xcopy.exe D:\Backup\System-0.tib T:\Test\ /e}

BUFFERED: System lags a little and Perfmon shows caches growing. DOpus lags heavily, browsing the destination in parallel is only possible with long freezes (10-20 seconds).

Days : 0
Hours : 0
Minutes : 1
Seconds : 8
Milliseconds : 830
Ticks : 688309249
TotalDays : 0,00079665422337963
TotalHours : 0,0191197013611111
TotalMinutes : 1,14718208166667
TotalSeconds : 68,8309249
TotalMilliseconds : 68830,9249

PS D:> Measure-Command {xcopy.exe D:\Backup\System-0.tib T:\Test\ /e /j}

UNBUFFERED: System runs smooth and Perform shows constant caches. DOpus lags a little bit but browsing the destination in parallel is possible (near instant).

Days : 0
Hours : 0
Minutes : 1
Seconds : 10
Milliseconds : 9
Ticks : 700095699
TotalDays : 0,000810295947916667
TotalHours : 0,01944710275
TotalMinutes : 1,166826165
TotalSeconds : 70,0095699
TotalMilliseconds : 70009,5699

We implemented different copy operations in the past handling a lot of files in parallel (multithreaded) and in fact we could measure a performance gain but only if we did the copy operations manually using unbuffered I/O operations using SSD or flash-memory based devices only.

Please have a look at the 4k single-threaded benchmarks in comparison to the 4k-64 multi-threaded benchmarks done by AS SSD. Multithreaded access patterns provide a significant performance boost for modern SSDs. This benchmark reflects truthfully what you can expect from your SSD when copying multithreaded (certainly you have to forget all these parallel stuff for old-fashion HDDs).

alex-is.de/PHP/fusion/articl ... ticle_id=2

The results of my tests show clearly that (at least on my machine) two things performs really bad:

  1. Copying thousands small files unbuffered -> takes minutes (unbuffered) vs seconds (buffered)

  2. Copying single large file buffered -> no throughput gain, inaccessible target, intermediate freezes and messed caches

From this point of view by double-role threshold idea makes sense to me.

Thank you, that looks very helpful.

It's a really big problem for me.
When i copy files in (very) slow drives (some usb key, nas server or my mounted phone), lister freezes and i need to open an other lister.
I don't remember if it was the same thing with DO9 but it's really borring. :frowning:

I can't seem to reproduce this with any of the USB drives I have here, so I'm wondering if it is a problem that's specific to a certain brand of USB drive. If you can tell me which actual brand/model lets you reproduce it I'll see if I can obtain one.

I even can reproduce the freezes by using xcopy for copying from internal SSD to HDD (no USB at all) and DOpus only for browsing the target device (passed to xcopy). It's the same problem (except the missing progress dialog) that can be easily fixed by just passing /j argument to xcopy. In this case DOpus runs smooth and can browse the target device all the time (during xcopy performs in the background). At least for me it doesn't depend on USB instead I just need to fill the system cache faster than the destination can write to let the user interface stuttering of any application that depends on the target device.

xcopy [...] /j Copies files without buffering. Recommended for very large files.

Unfortunately /j makes xcopy performing very bad for thousands of small files, have a short look to my measurements here. Certainly I can perform another video clip showing the xcopy/DOpus setup if it is helpful for you (because it's easier to see the effect than to read about it).

I alreay have success reproducing this behaviour on around 15 machines (using DOpus in traveller mode) having very mixed hardware configurations (different CPU, HDD/SDD, video, network, but all of them with large memory configuration and using Win7 or Vista in 64bit).

Unfortunately I'm not able to reproduce these symptoms. As an example, I copied a 3.3GB file from an SSD to an HDD using xcopy (buffered, as in your example), and the destination was fully browseable in Opus the whole time with no perceived lags at all.

What kind of files are you copying? Are they large video files or archives, for example?

It definitely is not normal for a file copy between two fast devices to cause the second device to be unbrowsable. There must be something specific about the type of files, or the Opus configuration, or something extra installed on your system which is key to seeing the same delay.

For example, in the past I've seen issues where partially-copied video files would cause some video codecs to go crazy if Opus asked them for information about the files, when they appeared in a folder it was looking at, and that folder had columns which displayed info about video files (e.g. description, dimensions, bitrate, etc.).

Or maybe the target drive has volume shadow copy (system restore points / previous versions) enabled, and copying a large amount of data to it is forcing the OS to make room by clearing some of that data, which is slowing down access to the drive while it happens.

Or something low-level is choking on the amount of I/O. (Could be storage controllers -- both drivers and physical hardware -- or anti-virus etc. And with VMs that could include the host OS and real hardware/drivers as well as the guest OS and virtual hardware/drivers, and the VM software itself.)

There has to be something unusual like that going on as it just is not normal to see this on a HDD target. If it was, we all would be seeing it all the time when copying large files around. It's completely normal for the source device to read faster than the target device can write. Even without the SSD->HDD combo (which we have ourselves and exercise regularly without problems), that's true of most device combinations. Even the same brand of HDD as both source and target meets the criteria, since writing is almost always slower than reading for the same class of device.

Here's a video of me copying several large files from SSD to HDD while still being able to browse the target HDD, on a machine with 16GB RAM:

leo.dopus.com/temp/Copy_SSD_to_HDD.mp4 (23MB)

You can see the progress jumps straight to 2GB copied right at the start (I think the source files were probably still cached in memory after copying them to the SSD before the test), so a large write buffer is definitely in effect here, and the target drive is definitely writing slower than the source is providing the data.

Sorry, I too can't reproduce any of these problems. Going from internal SATA2 SSD to internal SATA2 7200RPM, External USB2.0 7200 & 5400 RPM and network 5900 RPM on an unRAID server which has to calculate and do the unRAID math at the same time. Large or lots of small files, never lags and I'm able to keep browsing/using Dopus. Running the latest Dopus and Win7 x64 Pro.

The progress bar lags on my system (Vista 64 / 8 GB) in a smimilar way as shown in Leo's video. It jumps stepwise from left to right and the throughput shows values in the range of 100 kb/s up to 80 mb/s. Thus the system's write cache seems to be filled and flushed alternately when copying large files, which doesn't make sense at all for me? Is there any advantage in copying large files pushing them through the write cache? Even if it does not freeze, the caches are filled with useless data and the progress shows artifical values and disappears much too early!?

I played around a little bit more and even if copying a single 8 GB files doesn't lead to any delays here (but jumping progress bar), copying of 16x 500 MB files to an USB stick provides 5 seconds freezes when accessing the stick in parallel with DOpus. The busy cursor is shown and I can't do anything, after 5 seconds the device is refreshed and everything is quick and fast again. This happens multiple times before the overall progress reaches the end and the dialog disappears. The progress increases stepwise in 500 MB steps.

My system is an old Core2Duo with 64 Bit WinVista and 8 GB memory, 1 TB HDD and only USB 2.

How can I create such cool videos to show the effect?

[Note: Most of what I say below talks is about write buffering in the context of individual files, not the setting that affects the entire device, which is different.]

Turning off write caching is complicated. It will also slow down some operations, so it doesn't make sense to simply turn it off.

We might turn it off for some other operations, if we can reproduce the problem it is said to be causing and be confident that nothing will go wrong, but only then.

It's not that we have intentionally turned write caching on. It is on by default. Turning it off is an extra step and requires extra logic in the copy code which may make it fail on some devices.

It isn't a simple case of flicking a switch in the code. It is much more complicated than that. Not just because a cut-off point has to be chosen about which files should use the buffer and which should bypass it. Windows is designed to assume everything will use write buffering, except for highly specialised tasks where you want to manage the buffering yourself (e.g. databases). As soon as you turn off write buffering you have to deal with low-level details of the filesystem, which vary from device to device, and which are normally abstracted away by the OS. When a program writes files with buffering off, it has to be careful how it writes the data, or the writes may fail.

(IMO, if the write buffer really is causing problems for some people, then Microsoft should really provide a "small write buffer" mode which still gives you the basic abstractions -- so you can just write data into the file without worrying about sector sizes, data alignment and file sizes -- but this is Microsoft we're talking about and they probably already know this, don't care, and will never do anything to help. :slight_smile: In an age where multi-gig files are completely normal, it seems off if the operating system's default modes of writing data cause such problems, and no alternatives have been provided, but perhaps that is just how it is. I can't help feel that something is being overlooked, though.)

Some of that is just the result of how/when the speed timings are made, combined with how the buffering works. Don't read too much into the "current" speed information; it is just for guidance. The average speed is more useful. The current speed is how fast Opus can write data into the write cache. If the write cache fills up then Opus will think it is no longer writing data, so the speed will be low for a moment, but the write cache is probably being written to disk (by the OS) at a fairly constant rate. There is no way to actually measure that rate, though. That information is not provided to applications. Apps can only know how much they have given to the operating system, not how much the OS has actually written to disk. (Unless buffering is turned off, which is not always what you want, as discussed above. And even with buffering turned off on the file, the HDD hardware itself still has its own write buffer...)

If you can reproduce the problem then please provide details for some of the things I asked about to help us try to reproduce what you're seeing.