Opening a comma-delimited CSV file in Excel

I have a button that opens files with Excel 2016. It has just one line:
Excel {f!}
When I open any comma-delimited CSV file with this button, the whole of each line, including all the commas, is in the left-hand cell of the Excel spreadsheet. Of course I can sort this out with Data --> Text to Columns, or I can close Excel, reopen it and open the most recent file --- both methods launch the necessary dialogue.

Which all takes ages. Apparently M$ has now made Windows so extraordinarily smart and intelligent that there is now no command that will open the CSV file with a given delimiter, or even with a default delimiter. (I tried adding sep=, or "sep=," to the command, but without success.)

I can add an initial line sep=, or "sep=," to the CSV file --- this works --- and I could easily automate that process onto a button. But I don't particularly want to edit the CSV file (although perhaps if I rename it, or I am careful never to save it, that would be an OK approach --- it just seems very clumsy to alter the file that you are opening).

QUESTION: Is there some DOpus way, button or script, to tell Excel by a command to open the CSV file using a comma as delimiter? Perhaps Excel can be told to read sep=, before it reads the file. It's hardly an unreasonable demand on "the world's best spreadsheet programme" --- no such problems with NirSoft's CSVFileView.exe.

For reference, I'm on a 64-bit Windows 10 Pro 1709 desktop, my language is English (UK), my keyboard is US, my list separator is a comma, my decimal separator is a full stop, and there is a kookaburra laughing at me outside.

Excel should open a csv like that without extra work. My guess is something about the file makes Excel not think it is a csv, or think all the data is in a single column. Maybe it has quotes around every line or something? Or the commas aren't commas and are another character that looks similar, perhaps.

This is really an Excel question since Opus just sends the filename to Excel, and that part is working. You might find people with more ideas at another forum, if no one here can suggest anything.

If you want, upload a sample file (assuming there's nothing private in it) and I'll see if there's anything about the file contents that looks like it's causing the problem.

I've attached my shortest such CSV file. The commas are surely the correct commas because it was generated by a DOpus script that lists, in comma-separated CSV form, all the calls to EPS files in all selected TeX files. Each row records:
an index for sorting, the TeX file, the line, the directory, the EPS file.
Excel opens it wrongly as I described above, but if I add the initial line Sep=, it opens correctly.

EPSFiles20180405.zip (487 Bytes)

I also have downloaded CSV files, from banks and elsewhere, and they have exactly the same problem. I got the Sep=, idea from web pages such as https://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files/606274

I tried this approach only after checking that the delimiter and decimal separator were correctly set, as in my first post (perhaps I should add, my digit-grouping-symbol is a space).

I could use a script to automate creating a new file with Sep=, on the first line, but then I would have files doubled up, and anyway there would be timing issues. I could create the new file on a RAM drive, but then any subsequent XLSX file would not be saved by default in the current directory. DOpus is so flexible that it is always worth asking if it can provide a solution, even for the rigidities of Office.

Why doesn't Excel have a proper command line, like every other programme on the planet?

That .csv file is UTF16. If you copy the contents and paste them into a new, normal text file with .csv extension, Excel will load it fine.

It seems Excel does support UTF16, but is rather stupid about it. (Typical Microsoft Office, really. :slight_smile:)

From Microsoft Answers Opening CSV file with UTF16 encoding in Excel 2010:

Excel does support UTF16 CSV Files. But only if the TABULATOR character is used as a delimiter, then Excel separates the columns correctly when opening the .csv File.

Leo, it works --- you're a genius! (Now I also understand why my recent attempts to join two text files resulted in Chinese characters one way around, and nonsense the other way around.)

So next questions:

  1. How do I make my DOpus script output a UTF8 text file and not a UTF16 text file? I've had a brief look at the DOpus Help files, but I'm a little out of my depth.

  2. Is there an easier way than using a hex editor to see if a text file is UTF8 or UTF16? If so, it may even be possible to add an optional column to the lister display .

  3. Is there a simple way to convert a text file from UTF16 to UTF8? The answer to question 1 should provide a read-in-then-output method using a very short DOpus script --- but there may be simpler methods, and there is third-party software.

  1. That depends how your script is writing the file/data at the moment.

  2. The Description column will tell you if a text file has a Unicode BOM at the start.

  3. For a single file, just open it in any decent text editor, copy it to the clipboard, then paste that into a new file (or into an Opus lister to create a new file; it'll use plain-text normally, at least unless something in the text requires unicode).

Thanks, Leo. I was tricked by the fact that JScript uses UTF16, and seems to have no option to use UTF8. Does the DOpus version of JScript have such an option?

  1. I created the file using
    var oFSO = new ActiveXObject ("Scripting.FileSystemObject")
    var oFile = oFSO.CreateTextFile (sFile, true, true)
    The third parameter is documented only as 'Unicode' in the JScript CHM file. After some searching, it appears that JScript creates a UTF16 file, with no option for a UTF8 file.

  2. Got it! All my DOpus-generated TXT and CSV files seem to be UTF-16LE. Can I assume that DOpus is looking only at the BOM at the beginning?

  3. I am trying to automate the process because I am constantly coming up against the problem. If I could solve question 1, I would have a solution to this question.

The Windows FileSystemObject that you're using has that limitation. (It's not a JScript limitation. The same is true if you use that object from VBScript.)

"Unicode" in Windows contexts usually means UTF16LE, but it's an overloaded term, of course.

A quick web search finds this suggestion (in VBScript, but the same technique should work in JScript): http://developer.rhino3d.com/guides/rhinoscript/read-write-utf8/

Tab-separated is a better option in general anyway, since you don't need special handling for strings with commas in them.

The whole thing is a real confusion --- the sort of thing that gives computing a bad name --- but I now know what to do.

A. I followed the link on ADO Streams, and I can confirm that the following translation into JScript produces a UTF-8 CSV file that Excel opens properly:

var objStream = new ActiveXObject ("ADODB.Stream")
objStream.CharSet = "utf-8"
objStream.Open ()
objStream.WriteText ("The data, I want, in utf-8")
objStream.WriteText ("\r\nMore data, for the, second row")
objStream.SaveToFile ("D:\DataForMunsalvaesche\Test.csv", 2)
objStream.Close ()

Strange that ADO streams are not documented in the JScript CHM file. I found some documentation at
https://www.w3schools.com/asp/ado_ref_stream.asp
It's not following JScript conventions, however. Open and Close are called 'Methods', but it doesn't matter whether or not they are followed by ().

B. The resulting CSV file has a BOM. When I remove it, Excel still opens the file correctly. But the 'Description' field in DOpus is now empty, confirming that DOpus is reading the BOM.

C. If I use a HEX editor to remove the BOM from the earlier UTF-16 CSV file, Lo! It opens correctly in Excel (and as expected the Description field is empty in DOpus). So that is another approach (and I have since found Leo's script and other people's developments of it on the DOpus forum to automate this removal of the BOM using blobs). And Jon's suggestion to use tab-delimiters is another approach.

Thanks, Leo and Jon. Hopefully it will be useful to others to have this information about such arbitrary rules.

1 Like

You can use this for buttons or commands which add, remove or toggle the UTF-8 BOM at the start of the selected file(s) (or a file specified on the command line):