Copying / Moving Files in LAN very slow in Opus vs Explorer

b-s-ger · March 5, 2019, 7:34pm

I'm not on 10G yet but I have received my new network card. It's the Solo 10G running on Thunderbolt 3. Still on 1G. My situation is WORSE than before.

Copying with Explorer = Full Speed
Copying with Total Commander = Full Speed
Copying with Dopus and 500 Kb Buffer = A third of speed is missing (roughly 75-85 MB/sec)
Copying with Dopus and 500 MB Buffer = Dopus completely HANGS. No file transfer. No network activity after 1,7GB of 8 GB is copied (as reported by Dopus).
I can't abort, I can't kill the process. I created a full Dump with Proc Explorer I can send you.

Leo · March 5, 2019, 9:15pm

All Opus does is call ReadFile and WriteFile in a loop. If something changes and that stops working, it is very unlikely to be due to Opus, since Opus is doing the same exact thing as before and isn't what has changed.

The dump shows Opus was waiting for WriteFile to complete when the snapshot was taken, while calling it with a buffer ~525 MB in size:

Clipboard%20Image

WriteFile should not hang. If it is hanging, something at a much lower level than Opus is going wrong. Possibly network drivers or network hardware, or antivirus/firewall. It's not Opus.

If the network card is the thing that just changed, I would suspect the card or the drivers installed for it cannot cope with that size of write buffer and hang when such a large write is requested. Try a smaller buffer, see if there are newer drivers (the network drivers built into Windows 10 are often surprisingly bad/wrong, especially for newer hardware) and possibly complain to the manufacturer that their card/driver is hanging with large writes.

b-s-ger · March 6, 2019, 6:27pm

Leo, I have searched through the forum and found this exact problem since at 2010 I think. Many such posts. I have now personally tested with an Intel I210 that is in my Thunderbolt Dock. I have tested with a Sonnet Solo 10 Thunderbolt Edition which contains a Aquantia AQtion AQC107S. I see the same effect regardless. I have disabled Antivirus and have the same issue.

I have now created three sets of files I will send you via DropBox.

Copy via Dopus (standard buffer size).
Copy via Robocopy.
Copy via Total Commander.

I have Screenshots and Process Monitor logs (full set). If there is banything else to track down the root cause for this problem which is in my opinion clearly a Dopus issue, let me know. I will try my best to find the problem.

b-s-ger · March 8, 2019, 12:06pm

Any news on all the info that I sent? Is it even helpful?

Leo · March 8, 2019, 1:48pm

Haven't had a chance to look at the logs in detail yet. We'll respond once we have.

Re the Total Commander log sent yesterday, it's most likely that it's just using the shell file copy API as we've mentioned before. Unless that log was done with its "large mode" enabled, which I think you said was slower than Opus?

There are lots of threads about copy speed on the forum, that is true. Most of them turned out to be people not measuring things properly (I'm not saying that is the case here; it doesn't look like it is). Some were real issues as well, either in what we're doing (all known issues there have been addressed as far as reasonable) or in how the multitude of other involved components behave (as you saw yourself, with massive speed differences with different network drivers and buffer sizes). We'll look at the logs to see if they point to where the problem is, once we're not in the middle of other things.

We're also planning to get some 10gig networking equipment to see how that behaves in one of our environments, but that will take some time as well, and won't start until the work we're currently busy on is done. (It's also quite possible we won't see the same issues with our setups, of course, depending on where the issues are, but either way it will give us some insight into what happens with 10gig networks and either something to fix on our side or something to present for comparison to what other people are seeing which may help find where the issue is if it's somewhere else.)

I've been planning to make a video about file copy speed, and how to measure things properly and what to look for, for some time, but I don't know when/if that will happen.

b-s-ger · March 8, 2019, 2:17pm

Ok thanks. Directory Opus is stellar in every aspect. Just that darn copy / move speed issue is driving me nuts. I'll receive my new 10G capable switch soon (on it's way from China) and will do more tests once it arrives.

b-s-ger · March 9, 2019, 3:54pm

I had to go to the default of 512KB in the buffer size with the new Sonnet Solo Thunderbolt network card, due to transfers are not reliable. Sometimes Dopus would hang, sometimes not.

Maybe it helps, I just did another test.

Copying TO the server I get around 80MB/sec
Copying FROM the server I get the full speed, 100 MB/sec.

All goes to/from the same drives.

Tested Teracopy: Full speed back and forth from/to server.

Leo · March 11, 2019, 1:02pm

I've looked at the log files now.

To answer the question of "How does Total Commander do it?", as you have it configured, TC was just calling the shell file copy API:

We've gone over that more than once in this thread and others.

The logs show both TC and RoboCopy are using the shell file copy API, in fact. So, again, we aren't comparing how your setup responds to lots of different programs vs Opus; we're comparing how your setup responds to the shell CopyFileEx API vs Opus. There are only two methods of copying files in play here.

That said, having logs of both TC and RoboCopy calling the same API was still useful as they show there is still quite a large difference between the two runs, despite them using the same very-high-level API to copy the file. (More on this below.)

That shows the speed varies a lot, even when doing things in an identical way. The margins of error here are large, into 10+ seconds.

Your tests were done in this order: Opus, RoboCopy, TC. That put Opus at a potential disadvantage because, after it had copied the file, the file data may have been cached in RAM, causing the read side to be much faster in subsequent tests. That is backed up by your disk usage graphs during the copy:

The disk usage graphs include all processes so we don't know how much is due to each program, but they give us a maximum usage.

Local disk activity while Opus was copying was about 80MB/s:

Clipboard%20Image%20(1)

Local disk activity while RoboCopy was copying was virtually nothing:

(Note the scale change between screenshots: 100 MB/s vs 1MB /s.)

Snap10

Local disk activity while TC was copying was also virtually nothing:

Snap15

As I've said many times, in many threads about copy speed, the order of the test can matter a lot, and tests should be repeated multiple times to account for potential caching. (Or you'd need to do a full power-down and reboot between tests, to wipe out the software and hardware caches.)

I doubt this accounts for all of the difference in this case, but it may account for some of it. (It's also possible it's fairly irrelevant as time spent waiting on the network massively dwarfs the time spend waiting on the local disk. But any proper test should avoid it as a potential unfairness.)

Based purely on the ProcMon logs (an apples-to-apples comparison), and ignoring what each program reports as the copy speed/time (which is apples-to-oranges), we can compare the three tests like this:

File size was 7,948,206,080 bytes = 7580 MB

Program	Buffer	Mode	Start	End	Duration	Speed	TWT
Opus	512 KB	Custom, Sync I/O	18:59:04	19:00:47	103 sec	73.6 MB/s	~0.06 sec
RoboCopy	1024 KB	CopyFileEx, Async I/O	19:08:26	19:09:47	81 sec	93.6 MB/s	~0.07 sec
TC	1024 KB	CopyFileEx, Async I/O	19:13:05	19:14:19	74 sec	102.4 MB/s	~0.07 sec

Observations:

Using a 1 MB = 1024 KB buffer size in Opus may be worth a try, so things are even there.
There's a 6 second difference between the RoboCopy and TC tests, despite both using the same high-level API to copy the data. That's over a quarter of the difference between Opus and RoboCopy, and shows we have quite a large margin of error. On top of the caching issue mentioned above, it backs up what I've been saying about needing to do multiple tests. I doubt that this and/or the caching issue account for the whole difference; I just suspect things may be closer than they look.
Each subsequent test is faster than the previous one, which may partially be due to caching (as discussed already). Some of that may also be down to random factors (which is why multiple tests are important).
The TWT column is interesting, and may be the key to what's happening. This is what I've called "typical write time", and is (very roughly, and from a very quick look) about how long each WriteFile call takes.

With the test setup, Opus pushed 512 KB of data each time it called WriteFile, and that took about 0.06 seconds each time. But the other programs, using CopyFileEx, were pushing twice as much data for only about 0.01 seconds of extra time. That is probably very significant.

It suggests to me that some part of the system is not buffering data properly, or that the actual transfer times are being dwarfed by some other overhead (e.g. some part of the network stack, or antivirus). The fewer times WriteFile is called, the faster things get, which should not generally be the case with buffered writes. (The data each program hands to WriteFile should go into an appropriately sized buffer (or a 'sliding window' type system, or a cycle of multiple buffers, etc.) allocated by the operating system, and that buffer is what is actually sent over the network. The programs writing into the system-allocated buffer should be abstracted from it, but obviously aren't for some reason (hence the sensitivity to buffer sizes).

Something that can happen with fast networks (depending on how they are configured, and the hardware involved) is that if they aren't given enough data to fill a full packet, they may delay for a moment to see if more data arrives. If more data arrives, they can pack it all together. If it doesn't, they may stop waiting and send an incompletely packet. How that is tuned can affect throughput and latency (which are usually in conflict with each other, and networks with different aims tune for one or the other, or a compromise between the two).

So maybe (some of) what we're seeing is one application-level buffer size working better with how this network is tuned than the other. The application-level buffer shouldn't matter if the operating system, network stack, etc. are all doing their jobs properly, but we aren't in an ideal world.

Another potential difference is that CopyFileEx is using asynchronous I/O while Opus is using synchronous I/O. Most software uses synchronous I/O, like Opus, as writing for async I/O is extremely complex and error-prone, and should not -- in theory -- make a difference when doing a simple, sequential write, as the operating system is supposed to take care of things with its own buffering, and certainly seems to in my own local tests; maybe it doesn't in all cases. (Unfortunately, when Microsoft worked on copy speed problems of their own back in Windows Vista, they changed the high-level CopyFileEx rather than addressing the low-level WriteFile/sync-buffering side of things. I have no idea why, other than they probably just wanted a quick fix to the more visible issue without doing the hard work of improving OS-wide performance that people were less likely to notice. Great for things that call CopyFileEx, but useless for everyone else.)

It's highly likely that these issues will affect just about everything that writes data to the network drive using anything other than CopyFileEx, as most software just opens a standard, buffered, synchronous write handle and writes data to it. CopyFileEx is actually the odd one out here, and the unusual case; it just happens that things which call CopyFileEx and Opus file copies are the things you tend to look at and that report speeds and times. (When was the last time you benchmarked how quickly Photoshop or whatever could write data to the network? I'm guessing never. But it and most other software are probably affected by this issue as well.)

We could look into making Opus use async I/O but that is a complex change that would need a lot of testing, so it isn't going to happen overnight. It's also not guaranteed to account for whatever difference is left after the testing issues discussed above are corrected, but it might. Our biggest problem is that we have yet to reproduce any meaningful speed differences on our own setups between Opus and other software, which makes it difficult to test different theories.

b-s-ger · March 11, 2019, 3:52pm

Thank you for the in depth explanation. What I can tell you, regarding other orders of tests, it makes no difference. I tested this multiple times, OPUS first, other software first. The max speed Opus achieves is always around 73-75MB/sec. The information regarding asynchronous/synchronous I/O is interesting. And might be the explanation what is going on. See this:

Windows Explorer CopyFile behavior

Starting with Windows 7, customers experience very fast copy speeds to and from the HELIOS PCShare file server. Microsoft internally optimized the CopyFile API which does up to eight asynchronous reads or writes in parallel, each of 32 kByte size. This feature is only available with the CopyFile API or doing asynchronous reads and writes in parallel, e.g. with multiple threads. The new CopyFile is an improvement for SMB servers when clients copy files using the Explorer or the CopyFile API. HELIOS LanTest measures the performance with sequential read/write operations using a specified block size, e.g. 128 kByte using Gigabit Ethernet testing. This reflects much better how applications work because it is very unlikely that applications use multiple asynchronous I/O requests in parallel. Depending on the network, client and server performance in a LanTest measurment of a 1 Gb network may get around 60 MB/sec reading and writing, the Windows Copy file may get around 100 MB due to multiple asynchronous I/Os.

b-s-ger · March 13, 2019, 5:19pm

Just got my 10 Gbit Switch and connected everything. I'm currently trying everything out. So far I can see Dopus is faster then on 1 Gbit, around 200MB/sec, with Teracopy I get around 400. I still have to fiddle around with settings but the overall problem is the same.

b-s-ger · March 18, 2019, 10:37am

I have news now that I have configured my 10Gbit gear. First of all, I can't reach full 10 Gbit speed since I push the limits of my setup (Thunderbolt 3 card connected to Laptop).
I always started with Windows Explorer, then Total Commander then Directory Opus. If caching is involved, Opus will be the beneficiary.

First I copied my 8GB test file from the server to my Laptop. The file is deleted before each transfer.

Explorer	Total Commander	Directory Opus
545MB/sec	665MB/sec	500MB/sec

Now I copied the file to the server from my Laptop. I deleted the file before every transfer.

Explorer	Total Commander	Directory Opus
518MB/sec	555MB/sec	185MB/sec

Now the strange outlier is actually copying TO the server with Directory Opus. I currently have the standard buffer size.

Leo · March 18, 2019, 10:41am

I'm not sure why you'd see such an extreme drop-off there. Something is definitely not buffering data properly in that situation.

We've spent the last few days looking into making Opus use CopyFileEx like Explorer and TC do, in situations where it can be used (simple file-to-file copies, nothing involving archives, etc.). It looks like that will work, but it needs a lot of testing and has some knock-on effects on other code, so it won't be finished for a while. But that should bring parity between Opus and other tools that use CopyFileEx, on setups that only work at full speed when that exact method is used to copy files.

It'd be interesting to know what the max speed of, say, extracting uncompressed data to the same destination using 7-Zip. My guess is you will see almost literally everything other than CopyFileEx hitting that speed limit. It's just you don't often measure anything else.

b-s-ger · March 18, 2019, 1:41pm

Just tested unpacking because I have time this week. I could only glimpse on the Performance Tab on the Task Manager because Total Commander does not report speed as it unpacks to a network share. Both Directory Opus and Total Commander show about 1,6Gbps on the tab.

I have now set the copy buffer size again up to 500MB and I see much better throughput in the 500MB/sec range. Saying that, let me apologize and explain further because setting the buffer size made Opus actually hang before.

I found that the driver of the network card has a bug if you enable Jumbo Frames. All kinds of weirdness happened, like RDP dropping but copying was still working. Now my server is happy with the jumbo frames and humming along but the Sonnet doesn't work with them. But I will test further.

b-s-ger · April 6, 2019, 1:24pm

Another effect of the synchronous I/O issue that I experience?
If I copy files over to my server with Opus, my music coming from the same server (Logitech Media Server) stutters for microseconds (audible).
If I copy stuff over with Total Commander, no such stuttering although the speed is higher (almost 700MB/sec).

Leo · April 6, 2019, 2:06pm

We'll be moving to CopyFileEx (as an option) in the future, which will give you the same behavior in Opus as you see in everything else.

No need to tell us other ways that Opus differs from CopyFileEx on your system. Opus will be using the same thing once we are ready to release that change.

b-s-ger · May 31, 2019, 10:17am

How long will it take for you guys to implement this change? Right now, unfortunately, Opus hangs from time to time with a larger buffer size than standard. The only way to stop it is then to unplug the network cable.

Copying large files is really annyoing with Opus and I have to switch to another tool.

Leo · May 31, 2019, 8:35pm

It's not Opus hanging but the network or filesystem drivers. Opus does the same thing regardless of buffer size, with the only change being the size of the buffer and the size parameter it passes to the API, and the hang happens while waiting for the API call.

It seems there is something quite wrong with the network hardware/drivers there, and it's just being masked by most things not being configured to use large buffers in the same way. Similarly (if I remember correctly) to how different you found performance when changing driver versions earlier, it points to the drivers or hardware not being mature yet.

If larger buffer sizes are causing problems, and there's no fix from the driver/hardware vendors, I would revert back to a smaller buffer size.

Re changing Opus to use the shell file copy API, I can't say exactly when that will happen, only that it will.

b-s-ger · July 11, 2019, 4:20pm

How high is this on the to-do list? I copy very large files daily and it's so annoying to use a different program than Dopus.

Jon · July 11, 2019, 8:43pm

We don't have an ETA for this yet as it's quite a complicated change.

Trojakm · October 5, 2020, 2:05pm

To all the devs.

We've been having some issues using the CopyFileEx API, and your forum is one of those very rare places where one can actually find any mention of this API in a problematic scenario.
About our use-case: we're using plain File.Copy calls in our C# application, and have contemplated providing a replacement that supports progress & cancellation in flight.
All is good in the majority of our test cases (as always...), however, we've found that CopyFileEx can make an RDP session almost totally oblivious to user input until the copy has finished. The session becomes "unidirectional": there are UI updates, but mouse clicks, keyboard input is not or very rarely handled, thus rendering the cancellation option pretty much useless.
Btw, Total Commander (using the default settings) also shows this behavior as under the hood it also uses CopyFileEx.

Regards,
Miki