Tools > Find Files + Containing Text: UTF-8 w/o BOM incorrect

Hello.

In the last version DO 11 in Tools > Find Files when using Containing Text option I observe the problem with search in UTF-8 w/o BOM txt-files, containing Cyrillic text (for example, PHP-files). The text search with the Cyrillic alphabet works only if the files are encoded in UTF-8 or Windows-1251.

In earlier versions (eg 9.5), there was no such problem: the search works correctly with Cyrillic text files in the UTF-8, UTF-8 w/o BOM and Windows-1251.

Which files isn't it working with?

Can we have some example files to test with?


TEST DO search.rar (306 Bytes)

Thank you.

The Simple Find panel requires a BOM to recognise files as Unicode, but the Advanced panel has an option to assume UTF-8 on files with no BOM:


Thanks :slight_smile:

I'm cheked this option, but all remained the same: 2 files in result.

The same two files, or is it now matching UTF-8.txt and UTF-8 wo BOM.txt but not Windows-1251.txt?

It would make sense for the 1251 file to not be matched when assuming UTF-8 with no BOM, since that means both files that lack a BOM will be interpreted as UTF-8, and only one actually is UTF-8.

You could add a second clause which searches for the same string with the UTF-8 option turned off in order to catch all three cases.

Thanks.
I am satisfied that I know how to handle UTF-8 wo BOM now.

  1. Yes, UTF-8.txt and UTF-8 wo BOM.txt but not Windows-1251.txt
  2. I noticed when the addition of the second clause, the latter condition is fulfilled only. Because the search must be performed in two steps with the sequential activation conditions?
  1. Make sure it is

Contains, Match, текст, [x] Assume BOM
OR
Contains Match, текст, [ ] Assume BOM

Then the overall filter will match the file if the first or second Contains... line matches the file.

They are checked in order, so if the first one matches the second one does not need to be tested. If the first one does not match, the second will be tested.

On the other hand, if you change the OR to AND, both lines will always be tested and you will also only get UTF-8.txt in the results, since it is the only file of the three that matches both lines.

Thanks.

I'm understand that checkbox Clear previous results must be disabled. And then both clause are done.

No, you almost always want Clear previous results to be enabled.

Both clauses are considered (if needed) by a single search. I am not sure where you are getting the idea that only one is ever considered.