Copying / Moving Files in LAN very slow in Opus vs Explorer

What happens if you use Total Commander's "Big File copy mode"? Are we comparing the shell file copy speed vs everything else again, or more than two things to see if the other end is sensitive to the way the data is sent in some way?

The buffer size making such a difference suggests that trying more buffer sizes may find a sweet spot for the network or device at the other end. Process Monitor can show the buffer size Explorer (or TC) is using when copying to the network device, which is worth a try in Opus.

Have you tried with a standard Windows server at the other end to compare that? That is what I use, and the link gets saturated no matter which software is sending the data, as it should.

I only see slowness copies when doing it over wireless (which is slow anyway). When I'm hard wired to ethernet I don't see slowness issues. For instance last night I copied 13 GB of MP4's to my NAS, Wireless took about 14 minutes, hardwired to ethernet took about 1.5 minutes. That's using 5400 RPM (5900 - being helium filled WD Red 8TB NAS drives) and 5400 RPM drive in my laptop. 1.5 minutes for 13 GB isn't bad in my eyes. I'm using DirOpus 11. Maybe Opus 12 is slower than 11?

Copy speed in 11 and 12 should be the same.

Horrible. Even slower than Dopus. Even if I set the buffer to something like 10240 k for different disks. Total Commander is as fast as the Windows Explorer (both directions) if I simply use the "Standard Copy method".

The most simple tests is copying a Windows 10 ISO file. It's totally consistent. I copy with Dopus TO the server and then FROM the server.
I have also tried this tip with the button to use nircmd.

This gives me full speed.

Reading this, I have now put in some ridiculous number in copy_buffer_size: 512 MB
And voila, I see much better speeds. But also strange fluctuations.

It's a Windows Server 2016 at the end.

OK, from the speed perspective, I think I reached a better speed with that ridiculous buffer size of 512 MB. I still think it's a bug in Dopus that needs to be investigated and fixed once and for all.
If you can write me how to configure Process Monitor I can help hunt down the root cause.

1 Like

Same issue for me although shares sit on a Linux FS. Opus is consistently 50% slower in xfer to the shares. Oddly, much faster from the share in comparison, bizarre!

Have responded here with some testing I did today:

I'm not on 10G yet but I have received my new network card. It's the Solo 10G running on Thunderbolt 3. Still on 1G. My situation is WORSE than before.

Copying with Explorer = Full Speed
Copying with Total Commander = Full Speed
Copying with Dopus and 500 Kb Buffer = A third of speed is missing (roughly 75-85 MB/sec)
Copying with Dopus and 500 MB Buffer = Dopus completely HANGS. No file transfer. No network activity after 1,7GB of 8 GB is copied (as reported by Dopus).
I can't abort, I can't kill the process. I created a full Dump with Proc Explorer I can send you.

All Opus does is call ReadFile and WriteFile in a loop. If something changes and that stops working, it is very unlikely to be due to Opus, since Opus is doing the same exact thing as before and isn't what has changed.

The dump shows Opus was waiting for WriteFile to complete when the snapshot was taken, while calling it with a buffer ~525 MB in size:

Clipboard%20Image

WriteFile should not hang. If it is hanging, something at a much lower level than Opus is going wrong. Possibly network drivers or network hardware, or antivirus/firewall. It's not Opus.

If the network card is the thing that just changed, I would suspect the card or the drivers installed for it cannot cope with that size of write buffer and hang when such a large write is requested. Try a smaller buffer, see if there are newer drivers (the network drivers built into Windows 10 are often surprisingly bad/wrong, especially for newer hardware) and possibly complain to the manufacturer that their card/driver is hanging with large writes.

Leo, I have searched through the forum and found this exact problem since at 2010 I think. Many such posts. I have now personally tested with an Intel I210 that is in my Thunderbolt Dock. I have tested with a Sonnet Solo 10 Thunderbolt Edition which contains a Aquantia AQtion AQC107S. I see the same effect regardless. I have disabled Antivirus and have the same issue.

I have now created three sets of files I will send you via DropBox.

Copy via Dopus (standard buffer size).
Copy via Robocopy.
Copy via Total Commander.

I have Screenshots and Process Monitor logs (full set). If there is banything else to track down the root cause for this problem which is in my opinion clearly a Dopus issue, let me know. I will try my best to find the problem.

Any news on all the info that I sent? Is it even helpful?

Haven't had a chance to look at the logs in detail yet. We'll respond once we have.

Re the Total Commander log sent yesterday, it's most likely that it's just using the shell file copy API as we've mentioned before. Unless that log was done with its "large mode" enabled, which I think you said was slower than Opus?

There are lots of threads about copy speed on the forum, that is true. Most of them turned out to be people not measuring things properly (I'm not saying that is the case here; it doesn't look like it is). Some were real issues as well, either in what we're doing (all known issues there have been addressed as far as reasonable) or in how the multitude of other involved components behave (as you saw yourself, with massive speed differences with different network drivers and buffer sizes). We'll look at the logs to see if they point to where the problem is, once we're not in the middle of other things.

We're also planning to get some 10gig networking equipment to see how that behaves in one of our environments, but that will take some time as well, and won't start until the work we're currently busy on is done. (It's also quite possible we won't see the same issues with our setups, of course, depending on where the issues are, but either way it will give us some insight into what happens with 10gig networks and either something to fix on our side or something to present for comparison to what other people are seeing which may help find where the issue is if it's somewhere else.)

I've been planning to make a video about file copy speed, and how to measure things properly and what to look for, for some time, but I don't know when/if that will happen.

1 Like

Ok thanks. Directory Opus is stellar in every aspect. Just that darn copy / move speed issue is driving me nuts. I'll receive my new 10G capable switch soon (on it's way from China) and will do more tests once it arrives.

1 Like

I had to go to the default of 512KB in the buffer size with the new Sonnet Solo Thunderbolt network card, due to transfers are not reliable. Sometimes Dopus would hang, sometimes not.

Maybe it helps, I just did another test.

Copying TO the server I get around 80MB/sec
Copying FROM the server I get the full speed, 100 MB/sec.

All goes to/from the same drives.

Tested Teracopy: Full speed back and forth from/to server.

1 Like

I've looked at the log files now.

To answer the question of "How does Total Commander do it?", as you have it configured, TC was just calling the shell file copy API:

We've gone over that more than once in this thread and others.

The logs show both TC and RoboCopy are using the shell file copy API, in fact. So, again, we aren't comparing how your setup responds to lots of different programs vs Opus; we're comparing how your setup responds to the shell CopyFileEx API vs Opus. There are only two methods of copying files in play here.

That said, having logs of both TC and RoboCopy calling the same API was still useful as they show there is still quite a large difference between the two runs, despite them using the same very-high-level API to copy the file. (More on this below.)

That shows the speed varies a lot, even when doing things in an identical way. The margins of error here are large, into 10+ seconds.


Your tests were done in this order: Opus, RoboCopy, TC. That put Opus at a potential disadvantage because, after it had copied the file, the file data may have been cached in RAM, causing the read side to be much faster in subsequent tests. That is backed up by your disk usage graphs during the copy:

The disk usage graphs include all processes so we don't know how much is due to each program, but they give us a maximum usage.

Local disk activity while Opus was copying was about 80MB/s:

Clipboard%20Image%20(1)

Local disk activity while RoboCopy was copying was virtually nothing:

(Note the scale change between screenshots: 100 MB/s vs 1MB /s.)

Snap10

Local disk activity while TC was copying was also virtually nothing:

Snap15

As I've said many times, in many threads about copy speed, the order of the test can matter a lot, and tests should be repeated multiple times to account for potential caching. (Or you'd need to do a full power-down and reboot between tests, to wipe out the software and hardware caches.)

I doubt this accounts for all of the difference in this case, but it may account for some of it. (It's also possible it's fairly irrelevant as time spent waiting on the network massively dwarfs the time spend waiting on the local disk. But any proper test should avoid it as a potential unfairness.)


Based purely on the ProcMon logs (an apples-to-apples comparison), and ignoring what each program reports as the copy speed/time (which is apples-to-oranges), we can compare the three tests like this:

File size was 7,948,206,080 bytes = 7580 MB

Program Buffer Mode Start End Duration Speed TWT
Opus 512 KB Custom, Sync I/O 18:59:04 19:00:47 103 sec 73.6 MB/s ~0.06 sec
RoboCopy 1024 KB CopyFileEx, Async I/O 19:08:26 19:09:47 81 sec 93.6 MB/s ~0.07 sec
TC 1024 KB CopyFileEx, Async I/O 19:13:05 19:14:19 74 sec 102.4 MB/s ~0.07 sec

Observations:

  • Using a 1 MB = 1024 KB buffer size in Opus may be worth a try, so things are even there.

  • There's a 6 second difference between the RoboCopy and TC tests, despite both using the same high-level API to copy the data. That's over a quarter of the difference between Opus and RoboCopy, and shows we have quite a large margin of error. On top of the caching issue mentioned above, it backs up what I've been saying about needing to do multiple tests. I doubt that this and/or the caching issue account for the whole difference; I just suspect things may be closer than they look.

  • Each subsequent test is faster than the previous one, which may partially be due to caching (as discussed already). Some of that may also be down to random factors (which is why multiple tests are important).

  • The TWT column is interesting, and may be the key to what's happening. This is what I've called "typical write time", and is (very roughly, and from a very quick look) about how long each WriteFile call takes.

    With the test setup, Opus pushed 512 KB of data each time it called WriteFile, and that took about 0.06 seconds each time. But the other programs, using CopyFileEx, were pushing twice as much data for only about 0.01 seconds of extra time. That is probably very significant.

    It suggests to me that some part of the system is not buffering data properly, or that the actual transfer times are being dwarfed by some other overhead (e.g. some part of the network stack, or antivirus). The fewer times WriteFile is called, the faster things get, which should not generally be the case with buffered writes. (The data each program hands to WriteFile should go into an appropriately sized buffer (or a 'sliding window' type system, or a cycle of multiple buffers, etc.) allocated by the operating system, and that buffer is what is actually sent over the network. The programs writing into the system-allocated buffer should be abstracted from it, but obviously aren't for some reason (hence the sensitivity to buffer sizes).

    Something that can happen with fast networks (depending on how they are configured, and the hardware involved) is that if they aren't given enough data to fill a full packet, they may delay for a moment to see if more data arrives. If more data arrives, they can pack it all together. If it doesn't, they may stop waiting and send an incompletely packet. How that is tuned can affect throughput and latency (which are usually in conflict with each other, and networks with different aims tune for one or the other, or a compromise between the two).

    So maybe (some of) what we're seeing is one application-level buffer size working better with how this network is tuned than the other. The application-level buffer shouldn't matter if the operating system, network stack, etc. are all doing their jobs properly, but we aren't in an ideal world.

    Another potential difference is that CopyFileEx is using asynchronous I/O while Opus is using synchronous I/O. Most software uses synchronous I/O, like Opus, as writing for async I/O is extremely complex and error-prone, and should not -- in theory -- make a difference when doing a simple, sequential write, as the operating system is supposed to take care of things with its own buffering, and certainly seems to in my own local tests; maybe it doesn't in all cases. (Unfortunately, when Microsoft worked on copy speed problems of their own back in Windows Vista, they changed the high-level CopyFileEx rather than addressing the low-level WriteFile/sync-buffering side of things. I have no idea why, other than they probably just wanted a quick fix to the more visible issue without doing the hard work of improving OS-wide performance that people were less likely to notice. Great for things that call CopyFileEx, but useless for everyone else.)

    It's highly likely that these issues will affect just about everything that writes data to the network drive using anything other than CopyFileEx, as most software just opens a standard, buffered, synchronous write handle and writes data to it. CopyFileEx is actually the odd one out here, and the unusual case; it just happens that things which call CopyFileEx and Opus file copies are the things you tend to look at and that report speeds and times. (When was the last time you benchmarked how quickly Photoshop or whatever could write data to the network? I'm guessing never. But it and most other software are probably affected by this issue as well.)

We could look into making Opus use async I/O but that is a complex change that would need a lot of testing, so it isn't going to happen overnight. It's also not guaranteed to account for whatever difference is left after the testing issues discussed above are corrected, but it might. Our biggest problem is that we have yet to reproduce any meaningful speed differences on our own setups between Opus and other software, which makes it difficult to test different theories.

1 Like

Thank you for the in depth explanation. What I can tell you, regarding other orders of tests, it makes no difference. I tested this multiple times, OPUS first, other software first. The max speed Opus achieves is always around 73-75MB/sec. The information regarding asynchronous/synchronous I/O is interesting. And might be the explanation what is going on. See this:

Windows Explorer CopyFile behavior

Starting with Windows 7, customers experience very fast copy speeds to and from the HELIOS PCShare file server. Microsoft internally optimized the CopyFile API which does up to eight asynchronous reads or writes in parallel, each of 32 kByte size. This feature is only available with the CopyFile API or doing asynchronous reads and writes in parallel, e.g. with multiple threads. The new CopyFile is an improvement for SMB servers when clients copy files using the Explorer or the CopyFile API. HELIOS LanTest measures the performance with sequential read/write operations using a specified block size, e.g. 128 kByte using Gigabit Ethernet testing. This reflects much better how applications work because it is very unlikely that applications use multiple asynchronous I/O requests in parallel. Depending on the network, client and server performance in a LanTest measurment of a 1 Gb network may get around 60 MB/sec reading and writing, the Windows Copy file may get around 100 MB due to multiple asynchronous I/Os.

1 Like

Just got my 10 Gbit Switch and connected everything. I'm currently trying everything out. So far I can see Dopus is faster then on 1 Gbit, around 200MB/sec, with Teracopy I get around 400. I still have to fiddle around with settings but the overall problem is the same.

I have news now that I have configured my 10Gbit gear. First of all, I can't reach full 10 Gbit speed since I push the limits of my setup (Thunderbolt 3 card connected to Laptop).
I always started with Windows Explorer, then Total Commander then Directory Opus. If caching is involved, Opus will be the beneficiary.

First I copied my 8GB test file from the server to my Laptop. The file is deleted before each transfer.

Explorer Total Commander Directory Opus
545MB/sec 665MB/sec 500MB/sec

Now I copied the file to the server from my Laptop. I deleted the file before every transfer.

Explorer Total Commander Directory Opus
518MB/sec 555MB/sec 185MB/sec

Now the strange outlier is actually copying TO the server with Directory Opus. I currently have the standard buffer size.

I'm not sure why you'd see such an extreme drop-off there. Something is definitely not buffering data properly in that situation.

We've spent the last few days looking into making Opus use CopyFileEx like Explorer and TC do, in situations where it can be used (simple file-to-file copies, nothing involving archives, etc.). It looks like that will work, but it needs a lot of testing and has some knock-on effects on other code, so it won't be finished for a while. But that should bring parity between Opus and other tools that use CopyFileEx, on setups that only work at full speed when that exact method is used to copy files.

It'd be interesting to know what the max speed of, say, extracting uncompressed data to the same destination using 7-Zip. My guess is you will see almost literally everything other than CopyFileEx hitting that speed limit. It's just you don't often measure anything else.

2 Likes

Just tested unpacking because I have time this week. I could only glimpse on the Performance Tab on the Task Manager because Total Commander does not report speed as it unpacks to a network share. Both Directory Opus and Total Commander show about 1,6Gbps on the tab.

I have now set the copy buffer size again up to 500MB and I see much better throughput in the 500MB/sec range. Saying that, let me apologize and explain further because setting the buffer size made Opus actually hang before.

I found that the driver of the network card has a bug if you enable Jumbo Frames. All kinds of weirdness happened, like RDP dropping but copying was still working. Now my server is happy with the jumbo frames and humming along but the Sonnet doesn't work with them. But I will test further.

1 Like

Another effect of the synchronous I/O issue that I experience?
If I copy files over to my server with Opus, my music coming from the same server (Logitech Media Server) stutters for microseconds (audible).
If I copy stuff over with Total Commander, no such stuttering although the speed is higher (almost 700MB/sec).