Network copy to SMB share from Windows 10

Possibly, yes, but we can only guess as we don't have the same setup.

I set up a win server 2016 vm with file shares on the nvme drive with the samba shares on it.
I performed the same copy test (windows 10 VM to windows server VM share). Opus copies at about 350Mb/s - total commander copies at 900MB/s. The issue does not appear to be restricted to Linux SAMBA but Windows Server SAMBA also.

Please use Task Manager, and either the Network or Disk graphs (whichever best corresponds to the activity in question in your setup) to time how long the operation really takes.

Always measure the time it takes for the activity to stop in Task Manager, rather than when the progress dialog closes, or speeds reported in the dialogs.

Tests also need to be done a few times to rule out caching and background tasks.

I don't have a virtual machine on an NVMe drive, and my LAN is only 1gbit, so I can't test at the same speeds you're using, but a test to a VM using a fast SSD shows both Opus and Explorer take exactly the same time (1 minute) to copy a 12GB file to a Windows share in a VM.

That was using my copy buffer settings:

Clipboard%20Image

Those are not the default settings; I've enabled non-buffered I/O, which can improve things but can also create compatibility issues with some devices/filesystems, which is why it is not enabled by default.

I'd also try setting copy_buffer_size to values like 64 KB and 5 MB to see if larger or smaller buffers are preferred on your setup. (You can also use Process Monitor on Explorer to see which buffer size it is using to copy to the destination, to try the same size in Opus.)

(When copying over my LAN rather than to a VM, everything I've tried runs at the same speed and is bottlenecked by the 1gbit network, give or take the usual randomness of disk and network speeds.)

How do I get the buffer size for explorer in Process Monitor. I filtered by explorer.exe, started the copy but I can see no reference to buffer in the process monitor display log.

I think some further iternal testing needs to be perfomed by Opus engineers.
It's no longer bleeding edge to have nvme and 10gb between machines and vm's now.
I would like to see how you get on with a set up similar to mine.

As for the tests:

Opus
1.) VM to share 12GB file 36s peak 370MB/s

Explorer
1.) VM to share 12GB file 18s peak 780MB/s

Opus is almost exactly half the speed.

Changing the buffer size in 10MB increments adds a perfomance boost of approx 10-15% peaking at around 450MB/s so still some way off explorer.

@mikeyo I reported a similar effect a while back. There is unfortunately no interest to investigate and fix that issue. Cranking the buffer up to 512 MB somehow made it better. Try setting it much higher. I'm interested how this will affect the speed on a 10G network because I plan to upgrade.

See my thread.

The Details column on the right should show how large each read or write operation is in bytes.

I suspected as much, shame really. Opus excels in all aspects except its own copy handler. I scouted the forums extensively and came across your issue and others, makes for disappointing reading and apparent lack of concern over the issue.

I even set up a new vanilla w10 vm, fresh install of Opus 12 and repeated the same tests to rule out a config/env issue but alas, same results.

What frustrates me is the simplicity of the problem. I mean, copying a large file over a 10gb network to a SAMBA share sitting on nvme drives, a common setup no?

I pay my yearly sub and have done for many years, big fan of Opus since Amiga days so hence the frustration.

Happy to be a part of further testing to help narrow down the potential issue if the Dopus dev team are keen.

Have you tried putting in 512 MB in copy_buffer_size?
And leave copy_nonbufferio_threshold disabled.
Does it change something? I only have Gbit but this helped.

It's not that we aren't interested in this issue.

It's that almost every time comes up, with a handful of exceptions, it has ended up coming down to something about how the machines/network are configured that is sensitive to buffer size, or some other thing that's outside our control, impossible for us to reproduce locally, and sometimes even favors Opus over Explorer when the wind blows in another direction.

That's when it isn't just the two programs being configured to do different things. (e.g. If you're copying a lot of small files and Opus is set to preserve dates or other metadata but Explorer isn't, then it has a huge impact, and few people think of such things.)

There have been cases where a real issue has been found and we've been able to improve copy speed. One example was with the progress dialogs being updated too often, in a way that was stalling the read/write threads from continuing when using very fast media, and also causing high CPU usage during copies. So there certainly are places where improvements can be made, and we're more than happy to make them, but we're also weary from many years of threads like this where it has turned out to be factors outside our control.

We're looking into getting some 10gig network hardware but it's literally 10x the price of 1gig (and that's for the low-end stuff that's limited to connecting only two rooms/machines). It's not mainstream at all, which is a shame as 1gig is not fast enough even for HDDs these days. (If 10gig was mainstream, I'd have my entire setup on it already.)

In the meantime, I did a test using two RAM disks and localhost networking.

  • RAM disks are even faster than SSD, obviously, so that should tell us something about the maximum throughput Opus can reach.

  • localhost networking won't simulate all the network hardware between machines, of course, but it means the data is at least going through the network stack in Windows, even if it gets to take a huge shortcut without hitting a real network. Best we can really do in short notice without 10gig hardware.

  • Opus isn't involved in sending network packets or low-level filesystem access anyway, so if there is any difference between copying to a share on localhost and copying to a real network share, then any issues introduced there suggest the operating system or network hardware are not correctly buffering the data for smooth transmission.

    If there is a way we can modify our side to make it run more smoothly, we're open to that, even if the problem really is that the OS is failing to do its side properly, but it's difficult to guess what that might be, especially without being able to reproduce the problem locally and play with different buffer sizes and so on.

The performance-critical file copy code is very simple. The code that happens before and after it to decide what to copy where and with which name & metadata, as well as error handling, streaming from archives, etc., is complex, but the main copy loop is very simple:

  • We call ReadFile and WriteFile in a loop, and send updates to the progress dialog every so often.

  • If non-buffered I/O is turned off (the default), then both calls are done on the same thread, but the OS is told to perform readahead buffering. If the OS is doing its job, the reads will be buffered in parallel to the writes.

  • If non-buffered I/O is turned on, we tell the OS not to do its own buffering on the read side, and do it ourselves. A separate thread calls ReadFile in parallel to the main thread calling WriteFile.

  • Note that non-buffered I/O is never used for the network share side of things, even when enabled in Preferences, so in today's test it isn't relevant to the write side of things, since we're copying to a network share.

I set up two 10GB RAM drives, V: and W:, located in physical memory (not the page file) using ImDisk:

A network share pointing to the W: disk was created, and I then copied a ~10GB file from the V: drive to the network share, using Opus, then Explorer, then Opus, then Explorer again. (Since everything is in RAM, caching should not play much of a part, unlike normal tests, so the order of the tests shouldn't matter too much.)

Here's a video showing what happened:

If anything, I'd say Opus is faster than Explorer in this case, just based on the progress dialogs. The two are close enough that I'd just call it even. (Unfortunately, Task Manager doesn't show performance graphs for this kind of disk, so I had to go by the progress dialogs, but those usually favor Explorer and disadvantage Opus, due to Explorer closing the dialog before buffers have been fully flushed, so if Opus looks faster/equivalent here, I'd say it is.)

We're seeing copy speeds of around 2GB/s (2 gigabytes a second) to a (localhost) network share there, copying the ~10GB file in about 3 seconds (or about 4-5 seconds in Explorer's case).

So whatever else is going on, the copy code in Opus is not limited to 350MB/s, we can definitely say that.

1 Like

Thanks for the through explanation in the copy routine. I think the best way to test is a 10Gb card or create a vmware or linux kvm virtual machine x 2 and set the virtual NICs to 10Gb that way you dont need to buy a 10Gb card. For info, my setup is this...

HOST: UNRAID 6.6.6 (Linux KVM) - 2950x, 32GB RAM
MTU: 1500 HOST and VMs
Virtual NIC: 10Gb static IP4

DRIVES:
1 x 860 EVO 2TB (VM images)
1 x 960 EVo 500GB (SAMBA Shares)
1 x Corsair MP510 nvme (passed through direct to Windows 10 VM)

I would like to see your results for copying a 4GB+ file from a W10 VM to a network share on another VM or RAM disk across the 10Gb network.

When I run iperf on the host and vms I can saturate the 10Gb link so my issue is not network related. Also changing MTU to 9000 jumbo frames on affected interfaces makes no difference in my case.

If you can simulate copying a large file over a 10Gb virtual or real network between two VMs or PCs both using RAM disks and get expected results, then I would say something funky is going on with my set up and more probably related to how the SAMBA shares are set up. I also ran speed test on all my drives and get expected performance stats.

@b-s-ger yes, i get an improvement to around 650-700MB/s both directions. Definately better. I tried smaller buffers in 50MB increments and 500MB seems to be the best.

Having said that, when copying with such a large buffer causes Opus to go sluggish when browing other network shares during the copy process. I reverted back to normal buffer size for now.

Here is a recording of me copying from W10 VM on 2GB/s+ nvme to another W10 VM using a RAMDISK with the folder TEST shared.

You can see Opus is a good deal slower than TC. Approx half the speed.https://resource.dopus.com/uploads/default/original/3X/d/8/d87e791ac0ba6909a61b65058d15807b0be2ebe4.mp4

1 Like

I have an update on the issue on my Gigabit connection. I have Intel I210 on both ends. Today I have some time so I updated the driver on both ends to the latest version Intel offers (23.5.1). I reset the buffer size to the default 512 KB. I copy the Windows 10 ISO.

Copy TO the server with Dopus is peaking at ~80 MB/sec
Copy TO the server with TC is at full Gbit speed 100 MB/sec

Now the strange improvement since I upgraded the drivers.

Copy FROM the server with Dopus is now at 100 MB/sec (!)
Copy FROM the server with TC is what it was before 100MB/sec

Something has changed since I upgraded the drivers.

If I put in a higher buffer number, like 200 MB I see better speeds copying to the server, but I also see a strange zigzag line in Task Manager. The higher buffer also worsens the copy speed from the server because the speed fluctuates a bit.

Conclusion: Dopus somehow reacts to the different driver versions somehow and TC does not. Can this help you track down the root cause for the difference between programs?

I run KVM virtio virtual 10Gb interface and not new driver update since 08/2018.
Granted, there may be some way of optimizing further but I'm not so sure it's my system at fault.
If Opus was 10-15% slower than other FMs like TC and also Windows itself, then fine, I can accept that.

1 Like

As mentioned in the other thread about this, it's possible TC (when not in its "large file mode" which is reportedly the same speed or slower than Opus in similar situations) is using the shell file copy API, in which case it's not really Opus vs TC vs Explorer, it's just Opus vs Explorer.

If the network hardware drivers (or some other part of the chain) have only been optimised for that one method of copying files, then they'll be slowing down almost literally every other program. e.g. Photoshop or Premier loading/saving files to the network are likely to be affected the same way Opus is.

If people are seeing such massive differences based on network driver versions then that points there as at least part of what may be going on.


I'm also thinking about writing a test program that tries a few different ways of reading / writing data so people can run it on their systems and we can see how much affect they have and whether one method or another seems to work best across multiple systems. (It's possible it will vary depending on the system, and may even completely not work on some devices, as we found with non-buffered I/O.)

Not sure exactly when/if we will do that, as we're in the middle of some other work at the moment, but it would be nice to get a better understanding of why some setups seem to be unable to buffer reads/writes properly. The drivers and operating system really should take care of this an abstract applications from it, since it's going to slow down almost literally everything that writes large amounts of data to the affected drivers, but maybe we can't rely on them to do so.

Microsoft certainly put a lot of time into speeding up the way Explorer copies in some situations, rather than fixing the low-level APIs so that all ReadFile/WriteFile access is similarly fast, which is a shame, but maybe the reality we're in, especially if people writing the network drivers and NAS etc. only test against what Explorer does.

Explorer is the easiest thing to benchmark (and also a misleading benchmark in some ways), but I suspect you'll find that if you benchmark how other software is performing with large file opening/saving, you'll find your setup is slowing them down as well. I may be wrong, of course.

That will be due to buffering, and the way Task Manager samples performance data. The overall throughput should normally be about the average of the zig-zag graphs, but sometimes Task Manager will sample the operation when it's writing into a buffer, and sometimes it will sample it when it is waiting for that buffer to finish being sent over the network.

There are multiple buffers in different places, so that's an oversimplification, but it's the gist of what's probably happening, at least if the overall transfer speed is close to the slowest of the network and drives' maximum. The peaks are presumably showing faster speeds than the chain of hardware can actually achieve, since they're measuring the speed something is writing into a buffer (or series of buffers) rather than the true end-to-end speed (which is hard for something like Task Manager to measure).

(It's also possible the zig-zags indicate the chain is stalling / going idle because it's not being fully utilised, but not if the overall speed matches what you'd expect of the slowest hardware in the chain, since you can't go faster than that.)

This is why it's best to measure the length of time that the network is active in Task Manager and to not pay too much attention to the maximum height of the graphs. Overall transfer time is what we care about, rather than max speed. Max speed is really just the maximum speed something can write into a buffer that then sits waiting for the rest of the chain to transmit the data. Task Manager is still a good tool to use, since it lets you see how long the transfer really takes (better than comparing different program's progress dialogs, at least).

I searched through the TC Forum because I remember TC had similar problems in the past. This was back in 2011.
The Speedcommander forum also has similar entries.

I also suspect Microsoft is keeping some information to themselves and Explorer makes some usage of CopyFileEx or other API's you can't use
or that they don't document. If you ever come around to writing that test program, I'd be glad to run it and help track down the root cause.

What about using CopyFileEx instead? Maybe there can be some implementation done in DOpus for this, so that when copying to network drives/shares, DOpus would use the CopyFileEx instead of the built-in handler.

So just for the network side of things, for the rest DOpus could use the current implementation. Would that be too complicated to implement?

It's planned: