Format of NTFS description stream

Uberman · November 17, 2016, 5:16pm

Can I please get an idea of the format of the NTFS file stream DOpus creates for a file? I am writing a separate utility (in Python) that runs from the command line and supports "descript.ion" files, and I want it to also support DOpus file descriptions.

I am currently "hacking" the file to get at the description text, and it appears to be working:

05/31/16 07:07:28           <DIR> --hs---- $RECYCLE.BIN
05/31/16 07:07:25           <DIR> --hs---- System Volume Information
11/06/16 14:07:18  22,001,731,072 -------- Sikun_diff_b1_s10_v1.tib
11/07/16 14:15:29  22,328,659,968 -------- Sikun_diff_b1_s11_v1.tib
11/08/16 14:30:17  22,623,965,696 -------- Sikun_diff_b1_s12_v1.tib
11/09/16 14:34:41  25,399,720,960 -------- Sikun_diff_b1_s13_v1.tib
11/10/16 14:16:53  25,457,882,112 -------- Sikun_diff_b1_s14_v1.tib
11/11/16 14:09:44  25,554,313,216 -------- Sikun_diff_b1_s15_v1.tib
11/12/16 14:25:29  25,726,647,296 -------- Sikun_diff_b1_s16_v1.tib
11/13/16 14:11:30  28,972,737,024 -------- Sikun_diff_b1_s17_v1.tib
11/14/16 14:08:46  26,522,119,168 -------- Sikun_diff_b1_s18_v1.tib
11/15/16 14:09:05  26,574,633,984 -------- Sikun_diff_b1_s19_v1.tib
11/16/16 14:30:59  23,591,707,136 -a------ Sikun_diff_b1_s20_v1.tib ---> Installed Project Professional 2013
10/29/16 15:09:03   2,056,237,568 -------- Sikun_diff_b1_s2_v1.tib
10/30/16 15:26:59   2,486,776,832 -------- Sikun_diff_b1_s3_v1.tib
10/31/16 15:09:45   2,750,278,656 -------- Sikun_diff_b1_s4_v1.tib
11/01/16 15:14:37  13,767,406,592 -------- Sikun_diff_b1_s5_v1.tib
11/02/16 15:44:36  54,023,253,504 -------- Sikun_diff_b1_s6_v1.tib
11/03/16 15:01:51  22,513,594,880 -------- Sikun_diff_b1_s7_v1.tib
11/04/16 15:12:28  21,946,804,736 -------- Sikun_diff_b1_s8_v1.tib
11/05/16 15:33:13  22,068,392,960 -------- Sikun_diff_b1_s9_v1.tib
10/28/16 20:20:49 852,553,800,704 -------- Sikun_full_b1_s1_v1.tib
   1,268,920,664,064 bytes in 20 files and 2 dirs
   2,531,587,006,464 bytes free

"Sikun_diff_b1_s20_v1.tib" has a DOpus description applied, and as you can see, my utility is successfully extracting it, but I'm kind of guessing at where the Unicode description is within the data stream. I see the BOM at the head, but I'm not really that interested in the raw data before the description text. I'd just like to know if there's a more reliable means of determining the offset of that text for my extraction purposes.

Thanks.

Leo · November 17, 2016, 6:27pm

For files, we use the Windows Summary Information Property Set API, and PIDSI_COMMENTS property.

That uses the \005SummaryInformation NTFS ADS stream. Sample code to read the comment:

BOOL fOk = FALSE;
HRESULT hr;

IPropertySetStorage* pStg = 0;
if (SUCCEEDED(hr = StgOpenStorageEx(pPath.GetString(), STGM_SHARE_EXCLUSIVE | STGM_READ, STGFMT_ANY, 0, 0, 0,
	IID_IPropertySetStorage, reinterpret_cast<void**>(&pStg))))
{
	IPropertyStorage* pSet = 0;
	if (SUCCEEDED(hr = pStg->Open(FMTID_SummaryInformation, STGM_SHARE_EXCLUSIVE | STGM_READ, &pSet)))
	{
		PROPSPEC ps;
		PROPVARIANT pv;

		ps.ulKind = PRSPEC_PROPID;
		ps.propid = PIDSI_COMMENTS;

		PropVariantInit(&pv);
		if (SUCCEEDED(hr = pSet->ReadMultiple(1, &ps, &pv)))
		{
			if (strComment.FromPropVariant(&pv) && !strComment.Empty())
				fOk = TRUE;
		}
		PropVariantClear(&pv);
		pSet->Release();
	}
	pStg->Release();
}

(This won't always be used for file types that have their own way of storing descriptions inside the file itself, where Opus knows and is able to write the formats. e.g. MP3 ID3 tags or JPG EXIF tags can be used to store descriptions as well.)

For folders (also for files, but I think only for non-comment data like Opus-specific labels), we have our own format which will be under the \007OpusMetaInformation NTFS ADS stream.

The start of that data will be this structure (all DWORD and int fields are 4 bytes):

	DWORD		omd_dwSize;
	DWORD		omd_dwFlags;
	int			omd_iRating;
	DWORD		omd_dwCommentSize;
	...more data up to omd_dwSize size...
	...the size may vary depending on Opus version...

Immediately after that will be the comment string in WCHARs (two bytes per char) and null terminated, but only if omd_dwCommentSize is non-zero. (If it is zero, there won't be any string, not even a null.) Following that there may be other data.

Uberman · November 18, 2016, 3:55am

Wow, much more than I was expecting. Thank you so much, Leo!