Custom columns reduce muliple processing

I am looking to improve the efficiency of a custom column script regexp columns, by reducing the re calculation of expensive column calculations.
I want to only calculate each column once, and only for the columns displayed.

I have my columns set to be multi columns. I have 14 columns configured, but only two selected.

I have access to an array of all of the 14 columns configured for the script, so I can try calculate all the values at once, to reduce re-loading files. Not all columns will get a value for each file (as they regexp wont match), also they are not all selected and don't all need calculating.

The first time OnScriptColumn is called, scriptColData.columns.length has 14 items. Both selected columns are populated with values in this call.

The OnScriptColumn is called a second time where, scriptColData.columns.length has 7 items (I believe these are all the times that were not previously populated, the regexp not matching).

I have Improved the performance by removing any items from my column config list that are not in scriptColData.columns. This way any previously calculated columns wont get re calculated in subsequent calls. However If the second selected column didn't match an item it will be in scriptColData.columns for the second call and thus re run, possibly re opening the file.

  1. For multicol Columns, why is the OnScriptColumn called multiple times? Esp if the target column (OnScriptColumn.col) already has a value?

  2. Should scriptColData have all the possible columns for the script or just the ones that are selected? Could the scriptColData get a new prop selected columns?

For my use case, you might have a column configured to extracts data from a file name and from the file contents. If you only display the column that works on the file name, its still going to load the file. And if the second column does not get a value the first time OnScriptColumn the second time OnScriptColumn is called I will open the file a second time and still not get a value.

I have attached a copy of the script that has extra logging than the one in the main thread.
Regards

RegExpColumnsv 1.6.1.js.txt (36.5 KB)

If a file doesn't match, try setting the result to an empty string, instead of not setting it at all. (Unless you're doing that already. Script is 1000 lines long so I didn't look closely.)

I created a test script to better understand what is happening. As you said the other one is over 1000 lines, its a bit hard to follow. I also wanted to be able to create some test cases with detailed logging to see when an how columns were being calculated.

When using multi column scripts here are some points of interest.

  • DOpus will only run your script if at least one column is added to the Lister.
  • DOpus will request values for all of the scripts custom columns in one call. One column will be the primary or current column, the rest are secondary these may not be added to the Lister. By requesting calculation for columns that are not currently selected/visible, if a user adds a new columns DOpus wont require re-calcualtion. If the column calculation is fast this is fine, however if one column is fast to calculate and one slow, and only the fast to calculate column is visible this might be a concern as you might be calculating an expensive value that that is never displayed.
  • DOpus will make a primary request for all columns are not yet calculated and are visible. If a column is skipped, or failed to calculate as a secondary, it will be re- requested as a primary. This can be prevented by setting the column to a value (" or null is fine).

Due to how the regex columns work, and given there can be cheep and expensive to calculate columns this is what I found will best.

  • Start with a list of all configured columns.
  • Remove any that were not requested by Dopus (might have already been calculated).
  • Remove any that are not visible. I might try only remove expensive columns (I.E. ones that read the file).
  • Skip all calculations if the primary column is not one of the ones remaining for calculation. This prevents attempting to recalculate the same column many times if the calculation fails.
  • If a result cant be calculated for a column set it to a null or ''.

The script script does these things and has quite verbose logging to explain what is happening what column is being calculated and what is being skipped. It might be of interest to others who are looking to create multi column scripts. It has pre configured columns for different usecases. These will always return specific values I.E. populated string, a null, an emptyResult "" , or no result at all.
TestColumns.js.txt (6.9 KB)

That sounds right. As a general rule, if using multi-colimn mode, it makes sense to populate the primary column you're asked for explicitly, and then to also fill in any other secondary columns if calculating the primary one also means you have their data for free (or if they're trivial to calculate).

Expensive secondary columns can be skipped, and Opus will call the script for them explicitly if they are really needed.