Network copy to SMB share from Windows 10

@b-s-ger yes, i get an improvement to around 650-700MB/s both directions. Definately better. I tried smaller buffers in 50MB increments and 500MB seems to be the best.

Having said that, when copying with such a large buffer causes Opus to go sluggish when browing other network shares during the copy process. I reverted back to normal buffer size for now.

Here is a recording of me copying from W10 VM on 2GB/s+ nvme to another W10 VM using a RAMDISK with the folder TEST shared.

You can see Opus is a good deal slower than TC. Approx half the speed.

1 Like

I have an update on the issue on my Gigabit connection. I have Intel I210 on both ends. Today I have some time so I updated the driver on both ends to the latest version Intel offers (23.5.1). I reset the buffer size to the default 512 KB. I copy the Windows 10 ISO.

Copy TO the server with Dopus is peaking at ~80 MB/sec
Copy TO the server with TC is at full Gbit speed 100 MB/sec

Now the strange improvement since I upgraded the drivers.

Copy FROM the server with Dopus is now at 100 MB/sec (!)
Copy FROM the server with TC is what it was before 100MB/sec

Something has changed since I upgraded the drivers.

If I put in a higher buffer number, like 200 MB I see better speeds copying to the server, but I also see a strange zigzag line in Task Manager. The higher buffer also worsens the copy speed from the server because the speed fluctuates a bit.

Conclusion: Dopus somehow reacts to the different driver versions somehow and TC does not. Can this help you track down the root cause for the difference between programs?

I run KVM virtio virtual 10Gb interface and not new driver update since 08/2018.
Granted, there may be some way of optimizing further but I'm not so sure it's my system at fault.
If Opus was 10-15% slower than other FMs like TC and also Windows itself, then fine, I can accept that.

1 Like

As mentioned in the other thread about this, it's possible TC (when not in its "large file mode" which is reportedly the same speed or slower than Opus in similar situations) is using the shell file copy API, in which case it's not really Opus vs TC vs Explorer, it's just Opus vs Explorer.

If the network hardware drivers (or some other part of the chain) have only been optimised for that one method of copying files, then they'll be slowing down almost literally every other program. e.g. Photoshop or Premier loading/saving files to the network are likely to be affected the same way Opus is.

If people are seeing such massive differences based on network driver versions then that points there as at least part of what may be going on.

I'm also thinking about writing a test program that tries a few different ways of reading / writing data so people can run it on their systems and we can see how much affect they have and whether one method or another seems to work best across multiple systems. (It's possible it will vary depending on the system, and may even completely not work on some devices, as we found with non-buffered I/O.)

Not sure exactly when/if we will do that, as we're in the middle of some other work at the moment, but it would be nice to get a better understanding of why some setups seem to be unable to buffer reads/writes properly. The drivers and operating system really should take care of this an abstract applications from it, since it's going to slow down almost literally everything that writes large amounts of data to the affected drivers, but maybe we can't rely on them to do so.

Microsoft certainly put a lot of time into speeding up the way Explorer copies in some situations, rather than fixing the low-level APIs so that all ReadFile/WriteFile access is similarly fast, which is a shame, but maybe the reality we're in, especially if people writing the network drivers and NAS etc. only test against what Explorer does.

Explorer is the easiest thing to benchmark (and also a misleading benchmark in some ways), but I suspect you'll find that if you benchmark how other software is performing with large file opening/saving, you'll find your setup is slowing them down as well. I may be wrong, of course.

That will be due to buffering, and the way Task Manager samples performance data. The overall throughput should normally be about the average of the zig-zag graphs, but sometimes Task Manager will sample the operation when it's writing into a buffer, and sometimes it will sample it when it is waiting for that buffer to finish being sent over the network.

There are multiple buffers in different places, so that's an oversimplification, but it's the gist of what's probably happening, at least if the overall transfer speed is close to the slowest of the network and drives' maximum. The peaks are presumably showing faster speeds than the chain of hardware can actually achieve, since they're measuring the speed something is writing into a buffer (or series of buffers) rather than the true end-to-end speed (which is hard for something like Task Manager to measure).

(It's also possible the zig-zags indicate the chain is stalling / going idle because it's not being fully utilised, but not if the overall speed matches what you'd expect of the slowest hardware in the chain, since you can't go faster than that.)

This is why it's best to measure the length of time that the network is active in Task Manager and to not pay too much attention to the maximum height of the graphs. Overall transfer time is what we care about, rather than max speed. Max speed is really just the maximum speed something can write into a buffer that then sits waiting for the rest of the chain to transmit the data. Task Manager is still a good tool to use, since it lets you see how long the transfer really takes (better than comparing different program's progress dialogs, at least).

I searched through the TC Forum because I remember TC had similar problems in the past. This was back in 2011.
The Speedcommander forum also has similar entries.

I also suspect Microsoft is keeping some information to themselves and Explorer makes some usage of CopyFileEx or other API's you can't use
or that they don't document. If you ever come around to writing that test program, I'd be glad to run it and help track down the root cause.

What about using CopyFileEx instead? Maybe there can be some implementation done in DOpus for this, so that when copying to network drives/shares, DOpus would use the CopyFileEx instead of the built-in handler.

So just for the network side of things, for the rest DOpus could use the current implementation. Would that be too complicated to implement?

It's planned: