It is clear that the copy_nonbufferio_threshold is not being respected, with the copying being done with the buffer instead of the unbuffered io expected.
Nonbuffered I/O affects memory usage and avoids pushing other filesystem data out of caches, by preventing the filesystem from caching the data being copied in memory.
The graphs don't show memory usage so I'm not sure what you're expecting here or what these graphs are meant to show.
I'm showing that the nonbuffered I/O isn't being used by directory opus despite the advanced setting of a 1MB file size threshold to perform nonbuffered I/O during a file copy.
The first image shows obvious buffering as there is only one file being copied, yet the disk transfer rate has intermittent slowdowns as the buffer is being refilled from depletion.
The second image shows nonbuffered I/O as invoked through xcopy of the same file, which serves as the expected behaviour. Also note that the read speed has also increased as a result.
The destination is a ramdisk with effectively 1 GB/s of read+write speed (as measured by copyback)
There will always be buffering. Data has to be read into memory in order to then write it to somewhere else. The question is who does the buffering (the filesystem or the program itself, or some mixture of both) and what effect that buffering has on other data the filesystem is caching in otherwise unused memory.
Non-buffered I/O means the filesystem isn't buffering the data. Instead, Opus allocates a (fairly large) buffer and reads and writes to and from it in parallel from separate threads.
Whether performance is better or worse depends a lot on your system, hardware, drivers, filesystems, antivirus, and other aspects of the devices involved. (And sometimes it won't work at all as some filesystems or devices aren't tested properly with it, which is why it's off by default.)
Non-buffered I/O is not about copy speed (although it can be faster or slower, depending on lots of complex variables, that isn't the purpose of it). It's about avoiding large file copies pushing all the other data the filesystem is caching out of the cache.
One way to see if non-buffered I/O is working is to look at something that can show you the system's "standby" memory allocation. That is memory that is usually in use for filesystem caching but can be freed for more important purposes if it becomes needed. RamMap from Microsoft/SysInternals is one such tool.
You can also use Process Monitor which will tell you whether file-open / read-file / write-file requests are using buffered or non-buffered I/O.
There are also a lot of things which will block the use of non-buffered I/O (some choices we made in Opus, some limitations of the OS). Just turning it on won't always be enough to make it happen with a particular copy job. But you aren't testing/measuring it properly so we can't see if it's on or off, in Opus or XCopy. Looking at copy speed isn't the way to judge if the copy is using buffered or non-buffered I/O.
Usually, there is no reason to mess with this stuff. The defaults work fine for most situations.