Network copy to SMB share from Windows 10

Leo · January 10, 2019, 5:34pm

It's not that we aren't interested in this issue.

It's that almost every time comes up, with a handful of exceptions, it has ended up coming down to something about how the machines/network are configured that is sensitive to buffer size, or some other thing that's outside our control, impossible for us to reproduce locally, and sometimes even favors Opus over Explorer when the wind blows in another direction.

That's when it isn't just the two programs being configured to do different things. (e.g. If you're copying a lot of small files and Opus is set to preserve dates or other metadata but Explorer isn't, then it has a huge impact, and few people think of such things.)

There have been cases where a real issue has been found and we've been able to improve copy speed. One example was with the progress dialogs being updated too often, in a way that was stalling the read/write threads from continuing when using very fast media, and also causing high CPU usage during copies. So there certainly are places where improvements can be made, and we're more than happy to make them, but we're also weary from many years of threads like this where it has turned out to be factors outside our control.

We're looking into getting some 10gig network hardware but it's literally 10x the price of 1gig (and that's for the low-end stuff that's limited to connecting only two rooms/machines). It's not mainstream at all, which is a shame as 1gig is not fast enough even for HDDs these days. (If 10gig was mainstream, I'd have my entire setup on it already.)

In the meantime, I did a test using two RAM disks and localhost networking.

RAM disks are even faster than SSD, obviously, so that should tell us something about the maximum throughput Opus can reach.
localhost networking won't simulate all the network hardware between machines, of course, but it means the data is at least going through the network stack in Windows, even if it gets to take a huge shortcut without hitting a real network. Best we can really do in short notice without 10gig hardware.
Opus isn't involved in sending network packets or low-level filesystem access anyway, so if there is any difference between copying to a share on localhost and copying to a real network share, then any issues introduced there suggest the operating system or network hardware are not correctly buffering the data for smooth transmission.

If there is a way we can modify our side to make it run more smoothly, we're open to that, even if the problem really is that the OS is failing to do its side properly, but it's difficult to guess what that might be, especially without being able to reproduce the problem locally and play with different buffer sizes and so on.

The performance-critical file copy code is very simple. The code that happens before and after it to decide what to copy where and with which name & metadata, as well as error handling, streaming from archives, etc., is complex, but the main copy loop is very simple:

We call ReadFile and WriteFile in a loop, and send updates to the progress dialog every so often.
If non-buffered I/O is turned off (the default), then both calls are done on the same thread, but the OS is told to perform readahead buffering. If the OS is doing its job, the reads will be buffered in parallel to the writes.
If non-buffered I/O is turned on, we tell the OS not to do its own buffering on the read side, and do it ourselves. A separate thread calls ReadFile in parallel to the main thread calling WriteFile.
Note that non-buffered I/O is never used for the network share side of things, even when enabled in Preferences, so in today's test it isn't relevant to the write side of things, since we're copying to a network share.

I set up two 10GB RAM drives, V: and W:, located in physical memory (not the page file) using ImDisk:

A network share pointing to the W: disk was created, and I then copied a ~10GB file from the V: drive to the network share, using Opus, then Explorer, then Opus, then Explorer again. (Since everything is in RAM, caching should not play much of a part, unlike normal tests, so the order of the tests shouldn't matter too much.)

Here's a video showing what happened:

If anything, I'd say Opus is faster than Explorer in this case, just based on the progress dialogs. The two are close enough that I'd just call it even. (Unfortunately, Task Manager doesn't show performance graphs for this kind of disk, so I had to go by the progress dialogs, but those usually favor Explorer and disadvantage Opus, due to Explorer closing the dialog before buffers have been fully flushed, so if Opus looks faster/equivalent here, I'd say it is.)

We're seeing copy speeds of around 2GB/s (2 gigabytes a second) to a (localhost) network share there, copying the ~10GB file in about 3 seconds (or about 4-5 seconds in Explorer's case).

So whatever else is going on, the copy code in Opus is not limited to 350MB/s, we can definitely say that.