Opus freezes during large USB file transfers

Is that based on observations or just an assumption? Microsoft use an even smaller threshold of 256KB in Windows Explorer, for example.[/quote]

It is based on observations and I could approve that on my own system (see below). As you can see in the test results, copying thousands of small files unbuffered is very smooth but takes 15 times longer than copying these files buffered (eating up the caches, leading to freezes).

MANY SMALL FILES TEST (BUFFERED + UNBUFFERED)

Source:
D:\Prj\ (contains 12.840 files, total-size of 987 MB)
2x Intel SSD (intern / RAID-0)

Target:
T:\Test
1x WD HDD (intern)

PS D:> Measure-Command {xcopy.exe D:\Prj*.* T:\Test\ /e}

BUFFERED: System lags a little and Perfmon shows caches growing. DOpus lags heavily, browsing the destination in parallel is only possible with short freezes (2-3 seconds).

Days : 0
Hours : 0
Minutes : 0
Seconds : 16
Milliseconds : 485
Ticks : 164852579
TotalDays : 0,000190801596064815
TotalHours : 0,00457923830555556
TotalMinutes : 0,274754298333333
TotalSeconds : 16,4852579
TotalMilliseconds : 16485,2579

PS D:> Measure-Command {xcopy.exe D:\Prj*.* T:\Test\ /e /j}

UNBUFFERED: System runs smooth and Perform shows constant caches. DOpus lags a little bit but browsing the destination in parallel is possible (near instant).

Days : 0
Hours : 0
Minutes : 3
Seconds : 28
Milliseconds : 757
Ticks : 2087572198
TotalDays : 0,00241617152546296
TotalHours : 0,0579881166111111
TotalMinutes : 3,47928699666667
TotalSeconds : 208,7572198
TotalMilliseconds : 208757,2198

I guess the questions there are:

  1. Is it a difference of perception rather than when the data is actually finished writing? i.e. When the thousands of small files are buffered, the xcopy command will finish after they've been written to the write cache but not to the disk. Disk activity may continue for a long time after the command completes.

  2. Is the difference due to more frequent MFT metadata updates or just due to reducing the number of data writes to the disk (since with buffering, several small files may go to the disk in a single write, which won't happen without buffering, of course).

It looks like there is an API to measure how often the MFT is being written to, so I might throw together a tool to investigate whether or not buffering makes a difference to that. I'll be surprised if it does as I think the buffering mode is to do with the file's data, not the metadata or MFT, but I might be wrong.

Wouldn't it be nice to occassionally flush the system's write cache just to keep the progress dialog in sync with the actual copy operation? Occassionally flushing (when copying thousands small files) seems to be a good idea for me because it would highly increase the reliability of the progress bar and keeps the dialog abortable. Furthermore the throughput values are more reliable.

On my system the dialog disappears and the caches are still flushing for 1-2 minutes after copying thousands small files.

Personally I would prefer to see the dialog disappearing when the data actually was written allowing me to immediately unplug the USB-stick. Currently I have to keep an eye on the LED of my USB-stick to get an idea when I can remove it (and I hate that behaviour).

I would really appreciate the copy dialog disappearing when the data is actually written.

[ul][li]Flushing after every small file would slow things down.
[/li]
[li]I don't think Windows provides any way to flush individual file handles after they have been closed, so there doesn't seem like a way to retroactively flush only the written files after every N files.
[/li]
[li]Flushing the write cache for an entire volume is possible but would affect other programs (undesirable) and requires full administrator access/elevation, so isn't really suitable.[/li][/ul]

If you know another way to make what you're asking for happen then we might be able to give it a try but I suspect the current setting is as good as it can get, and it's just a question of working out a good threshold size as a compromise between buffering and non-buffering behaviour.

Copying thousands of tiny files to poorly-behaved USB devices is never going to be perfect, whatever we try, I suspect.

Seriously, just turn off write caching on your USB stick if it causes you this many problems.

I wrote a small script that copies 5.000 files of 100 KB size each. 99 files are copied buffered and the 100th file is copied unbuffered in a loop until all files are copied. The system keeps responsible and there are no freezes anymore. If I change the script to copy the 100th file buffered also (all files buffered), the caches start growing and the freezes are back. Maybe the write operations of a single device are performed sequentially by the file system in a way that fixes this problem without explicitly flushing?

xcopy 1.bin T:\test
xcopy 2.bin T:\test
xcopy 3.bin T:\test
[...]
xcopy 99.bin T:\test
xcopy 100.bin T:\test\ /J
xcopy 101.bin T:\test
[...]
xcopy 199.bin T:\test
xcopy 200.bin T:\test\ /J
xcopy 201.bin T:\test
[...]

At least in my case I have tested with an internal WD harddisk (no SSD) connected via SATA-2 as mentioned in my test:

Source:
D:\Prj\ (contains 12.840 files, total-size of 987 MB)
2x Intel SSD (intern / RAID-0)

Target:
T:\Test
1x WD HDD (intern)

[quote="AKA-Mythos"]I wrote a small script that copies 5.000 files of 100 KB size each. 99 files are copied buffered and the 100th file is copied unbuffered in a loop until all files are copied. The system keeps responsible and there are no freezes anymore. If I change the script to copy the 100th file buffered also (all files buffered), the caches start growing and the freezes are back. Maybe the write operations of a single device are performed sequentially by the file system in a way that fixes this problem without explicitly flushing?

xcopy 1.bin T:\test
xcopy 2.bin T:\test
xcopy 3.bin T:\test
[...]
xcopy 99.bin T:\test
xcopy 100.bin T:\test\ /J
xcopy 101.bin T:\test
[...]
xcopy 199.bin T:\test
xcopy 200.bin T:\test\ /J
xcopy 201.bin T:\test
[...][/quote]

I set up the same thing, and I don't see any difference in cache usage or behaviour. Here is a video:

leo.dopus.com/temp/xcopy_small_f ... fering.mp4

The source files were 544 copies of the same 155KB exe file, renamed to .bin.

Nothing was monitoring the source or destination folders. (Not open in any Opus window, etc., in case that caused the files to be opened and cached which would cause confusion when trying to look at the effects of xcopy only.)

copy_99_buffered.bat would xcopy the files from SSD to a HDD, using the /J arg on every 100th file.

copy_all_buffered.bat would xcopy the files from SSD to a HDD, never using the /J arg.

Behind the command prompt you can see Resource Monitor, where filesystem caches are attributed to the Physical Memory graph's Standby value.

On the right there's also a meter showing total disk I/O for the source and dest drives respectively, although it never goes very high anyway.

When I run either batch file, Standby memory usage increases very slightly. (500 x 155KB is only ~82MB so it's not going to be much whatever happens.)

When I delete the destination files, that slight increase seems to go away. (So the source files are probably cached anyway, and staying cached, and the increase is probably just for the destination files, and gets binned when the destination files are deleted, and re-allocated when they are copied again.)

So, yeah, I am really not seeing what you describe. I'm not sure how you would see significant cache usage or any performance issues from copying such a small amount of data, anyway, unless something in your system (hardware, drivers or antivirus) is handling the situation poorly.

Edit: I'll try again with 5000 files instead of 500, but I doubt it makes a difference, given I'm already seeing buffering happen and not be cancelled out with 500 whether or not /J is used every 100 files.

Yep, exactly the same with 5000 files:

  • Cache usage slowly increases whether or not xcopy is passed /J every 100 files.
  • Disk I/O is virtually zero, either way. (Suggesting most of the overhead is creating the files and how their data is copied once they are created is insignificant.)

Yep, exactly the same with 5000 files:[/quote]

Ok thanks for the try.

I'm really annoyed because I don't know which driver or hardware component affects these strange behaviour on our systems.

We have hundreds of Dell and HP machines here and at least the systems I have access to are all affected.

Currently I'm creating a short video clip for you on a brandnew HP machine we have unwrapped today mornig.

Ok, I retried your test case with many hundreds files (random data generated as *.bin) and unbuffered threshold set to 1 MB.

Test-Run 1)

I created the binary files with a size of 1.2 MB (each). Then I copied these files from internal harddisk D: to internal harddisk H: and the copy operation was perfectly smooth. I even can click the abort button at any time and the throughput values are correctly shown 78 MB/s. Everything feels reactive and looks very nice.

Test-Run 2)

For this run I created the binary files with a size of 0.8 MB (each). Then I copied these files from internal harddisk D: to internal harddisk H: and you can see how it performs on our hardware here. I re-run the test on three different systems (Dell and HP). It's all the same.

dropbox.com/s/roc2ub9rqo0tm ... esCopy.mov

Anyway it's obvious that it must depend on specific hardware or driver configuration, since you can not reproduce the problem.

But it's obvious that unbuffered writing perfectly fixes these damn caching issues on our systems since the 1.2 MB files can be copied perfectly.

Unfortunately I can not forcly use the unbuf argument all the time because copying of small files takes 15 times longer (and I do that many times per day).

So the only way is to begging here in the forum for an extra-flush when copying small files (buffered) to avoid the growing caches.

It makes me wonder how your computers are usable at all - small files are opened, created, read and write all the time on a normal Windows machine. Presumably you don't just boot your computers and let them sit there untouched except for the occasional burst of heavy file copying. Surely if file caching was as much of a problem as you seem to think it is your systems would just get gradually more and more unusable over time until they'd eventually grind to a halt all on their own.

We only use Dell and HP systems in standard configuration as they can be bought from the online shop for big companies.

I assume that copying thousands of small files isn't a daily use case for 99% of the users.

May be the problem depends on the high throughput and the large memory these machines have. For example I can read about 1100 MB/s from the primary RAID-0 on the system that is currently sitting in front of me. I do not know how these may affect or rise certain issues other people will never have.

Anyway these systems are rocket fast and reliable unless we copy thousands of small files or a single large file.

Fortunately you have already provide a fix allowing us to copy large files rocket fast and now we are hoping for a fix bringing the same performance for small files also. :slight_smile:

Since your company seems to be buying these Dell & HP machines (and Windows licences) by the crate, maybe it's worth opening a support ticket with the hardware providers and/or Microsoft to get them to investigate why decent hardware is performing so poorly (and unlike other setups) when copying lots of small files.

It seems like there is an IO bottleneck somewhere in your setup. (It could be something common to all the machines, like a Windows policy setting or an extra component that you install as standard.)

After all, this happens with every single file copying program, not just Opus, and you have problems whether or not buffering is used. (If the voodoo of turning off buffering every 100 files works with xcopy on your machine, I think the vendors need to look at your machines and work out why that is, as there's no reason it should and it definitely doesn't do the same thing here.)

That's an option. I will ask a system administrator here. But as long as multiple hardware configurations are affected it would be difficult to write a precise issue.

I'm pretty sure that there are many people affected but most of them will never notice that and even if they notice some strange behaviour they don't mind because computers do strange things that nobody understands the whole day long. :wink:

Anyway DOpus doesn't freeze anymore when copying large files here, thus it was a big step that makes us really happy.

Even if thousands of small files keeps freezing due to the same reason, we have a workaround by forcly using the unbuffered mode (changing the hotkey).

Hmmm, but from my point of view I think a reliable progress dialog is very important!!

I think it would be worth to spend some extra time to forcly flush the system write cache when writing small files (e.g. flush each 100 MB) just to get correct throughputs and progress indicators in the dialog.

Forget all these freezes!

Personally I hate dialogs that disappears much too early or that shows magic throughputs or jumping progress bars. If I understand the discussion correctly, the quality of progress will be greatly inccreased by introducing these extra flushes to synchronize the dialog with the actual copy state. So that's important enough to do that.

If I currently copy 1000 files of 500 KB the progress shows imaginary values and disappears much too early. But thankworthy to the new unbuffered mode at least my system shows very accurate progress in the DOpus copy dialog now for large files and I was falling completely in love with that. But unfortunately if copying a lot of small file the buffered mode still shows imaginary values and I hate that. :wink:

I'm pretty sure that DOpus users would appreciate correct progress when copying a few large as well as a lot of small files.

At least I would like to see accurate correct values all the time and particurlarly I want to pull out an external device after the copy dialog is closed instead of waiting for the system caches to be flushed.

With small files, you can have speed or you can have an accurate progress display. Pick one. :slight_smile:

There is no way to do that, as I already explained above, unless you know of an API in Windows to do what you are asking for, in which case let us know the details and we'll try it.

Even if flushing were possible, you'd then have delays where the progress dialog stopped updating while waiting for data to flush to disk, so I am not really sure what you'd gain and it would slow things down, either way.

As Jon said, you should turn off write-caching on your USB-stick if it is causing you so much problems with that device.

Also, if you want to know when it is safe to remove a USB-stick, that is what Safely Remove Hardware is for.

I assume that you use CopyFile() for small files (letting Windows do all the stuff) and WriteFile() for large files (passing FILE_FLAG_NO_BUFFERING and handling the buffers yourself).

Instead you always have to use WriteFile(). But for small files you simply do not pass FILE_FLAG_NO_BUFFERING and call FlushFileBuffers() for each 100th file before closing. Since the write operations of a single process should not overtake each other this should work.

If it doesn't work you can keep the file handlers open until a threshold is reached (e.g. 100 handles) and then you call FlushFileBuffers() for all handles before closing them. Since most of them are already written, this should be done in zero-time.

Anyway you have to use WriteFile() to keep control for buffered IO and simply flush the handles as you want.

You assume wrongly. We never use CopyFile for anything and already use WriteFile for everything (always have). The beta version already selectively turns off buffering for all but small files, and I already explained why the other suggestions won't work.

Maybe FlushFileBuffers() is a little bit too heavy! I propose a more simple way for the first try.

Keep everything as it is (use CopyFile() for small files) but introduce two additional thresholds for small file copying:

(1) Max small file count before sync: This value specifies the number of small files that can be copied before the file cache has to be forcly flushed.

(2) Max small file size before sync: This value specifies the total size of small files that can be copied before the file cache has to be forcly flushed.

Personally I would recommend to introduce (1) as new advanced option but use the value of the newly introduce "nonbuffer_threshold" for (2) also.

Finally you just have to count the numbers and sizes of small files to detect if a threshold is reached and an additional flush has to be performed.

This additional flush can be easily performed by just copying the next small file by using WriteFile() instead of CopyFile() and passing FILE_FLAG_WRITE_THROUGH only (do not pass FILE_FLAG_NO_BUFFERING). The write through flag means that the system cache is used (the data to be written is appended to the end of the cache) but the calling thread does not return until the data (and all previously submitted data) was actually written to the disk. You can think about that behaviour as a kind of indirect flush.

In other words, the file system cache is filled with CopyFile() as before but each n-th file is copied using WriteFile() in write-through mode which appends the data to the end of the cache and the calling thread is suspended until the previously submitted data (by CopyFile) as well as the current data (by WriteFile) is actually written to the disk.

msdn.microsoft.com/en-us/library ... g_behavior