File tagging/searching (Was: Now, if DOpus did THIS...)

Request? Fantasy? Dreaming? I don't know. Read on...

What I want---no no, what I NEED, is del.icio.us for my files. Has anyone used, i mean really USED del.icio.us? Have you "gotten" the tag INTERSECTION feature? Have you understood it? THIS is what I need for my files.

Say, I have a paper in Word (.doc) format. It's a paper on philosophy. The emphasis of the paper is on Greek philosophy, and in that Greek section, 50% of what it discusses is Aristotle. The rest are other Greek philosophers. It continues with a discussion of Eastern philosophy/ers as well as modern (later) philosophy. Out of all that, the emphasis of the discussion is on Ethics.

Ok, so now.. WHERE do I file this thing?? Under Greek philosophy? But it's not only about that...

Under Aristotle? But it's not only about him...

Under philosophy? That's too general to be useful. I would end up having HUNDREDS of "philosophy" stuff in that folder.

Under Ethics? But it's not only about that...

Same with music, for example. How many of you have difficulty organizing LARGE music collections?? Is this song blues, rock, prog rock, hard rock? Well, it's ALL of that! It's got elements of all of these genres.

You can see where I am going with this. I want to use TAGS (or categories or labels or whatever you want to call it). I want to tag it with multiple tags, and then have a way to BROWSE my files based on tags.

Sure, we start talking about file-system stuff here, and I know that Microsoft is working on some related things with WinFS, but they won't have it ready until 2007 at the earliest (Also what they are doing is a lot more ambitious and complex than just this.)

The way I imagine this to work, is having a database. A simple database that would keep track of the file and its tags. I could build that today. In Access even. That's not the issue.

The issue is, having a FILE MANAGER that understands and interracts DIRECTLY with that database in real-time and in the background. The user wouldn't even know it's happening. Man, what a GREAT feature that would be. Simply FANTASTIC.

I also think that this is the BEST time for DOpus to introduce such a feature. del.icio.us, tagsonomies, tag-based and label-based systems (such as Gmail) are all the rage RIGHT NOW (just do a search in Google or in the blogsphere), people are introduced to them left and right, and I can tell you that, at least in the GEEK space, DOpus would becoming a MUST HAVE application.

In such a scenario, I wouldn't even care of where I keep my files. They could all be kept in My Documents, for all I care. As long as I would have a way to browse them via tags in a file manager that understand them, it wouldn't matter where they are!

Of course, if you lose that database with the tags, you are screwed :slight_smile: But we are all keeping backups of our stuff... right??? :slight_smile:

Cheers.

Any particular reason why you can't use the NTFS comment field to put your tags in there? DOpus can search these.

Also, assigning files to file collections based on whatever criteria you would otherwise associate with files inside a tag or database reference get's the same sort of thing accomplished...

Edit note- I actually use file collections as a way to define music 'playlists' that are always valid for me regardless of whatever media player I'm trying out - though of course most players support standard m3u playlists.

RhinoBanga: Searching is not at all what I am talking about. del.icio.us is not about searching, Google is for that.

steje: File collections are static, pre-defined ways of combining information. I can't possibly know ahead of time what combination of tags I may need at any one time.

Even if I defined just 10 tags in my entire filing scheme, the possible permutations (combinations) among them would be over 100! And it grows geometrically from there.

What I am talking about here is faceted classification (I believe that's the official term for it). I am talking about dynamically browsing your file system based on any type of charasteristic (or tags) organized in facets.

I am talking about dynamically browsing your file system, making decisions of what to view on the fly, without having to know what you will need first, before you actually need it (as is the case with file collections). Also, as soon as you are done with what you need, the "results" of your browsing get discarded immediately (which, again, is not the case with file collections).

You guys are both completely missing the point here. If you find this topic interesting at all, do a search on "faceted classification" in Google. It's an exciting possibility that DOpus would do well to be first in the market with.

Well, I don't really think I'm 'completely missing the point'. I do understand there are differences between the existing alternatives we've suggested and the kind of idea behind "faceted classification" and what we're supposed to see in WinFS someday... and obviously there is a lot of attention being drawn to the functionaly that MS will provide with their implementation, so I didn't mean to imply your idea didn't have merit or anything.

But personally, I think there's a lot of hype about how 'dynamic' it's really going to be. At least from what I'd read on the functionality WinFS was going to provide; accessing files based on keyword/tag metadata still starts with a 'manual' effort on the part of the user to associate keywords with specific files/filetypes. Of course, getting to those files based on multiple keyword searches is more powerful than file collections... but it's not really all that different from Rhino's search suggestion - because from what I understand, the representation of files based on those keywords is indeed a 'search' against the keyword data kept for the files in the FS database... albeit probably faster.

steje, thanks for writing.

I believe we are both understanding the issue, albeit from different perspectives. One thing I need to reinforce here, I guess, is that fact that faceted qualification provides something that search doesn't (and is in fact, cognitively, the opposite side of it): Browsing.

Search on keywords is fine when I know exactly what I am looking for. However, if I have a large file collection, I need to rediscover what I have, and that's where browsing comes in.

You used music collections as an example. So let me give you an example based on that. There are times, that I don't really know what I am in the mood to listen to--I don't really know what I want. Do I remember what I have? You can bet I don't. After years of collecting music, there are albums that I haven't listened to in 3, 4 or more years. I may remember some of what I have, but definitely not all. So I can't search something I don't remember. I need to "rediscover" it. I need to see it in front of my eyes to be reminded that "aaah right.. I have this too."

So, I need to browse my collection to get ideas (or suggestions) of what to listen to based on my collection itself.

There's a huge mental & cognitive distance between search and browsing which I believe we both understand, so I won't go deeper into it.

You also mention that the user manually has to add/edit tags. Well, don't you manually populate your file collections in DOpus? Don't you manually maintain them? I would argue that using tags the process is easier, because, say, once you reorganize the tags, the contents of those tags get reorganized at once too (sort of, like working with styles in a Microsoft Word document--you change the style, you change everything that's based on it.)

As for WinFS, it has much more ambitious goals that what we are discussing here, so I will consider it beyond the scope of this post. What we are discussing here is already being done. I have already mentioned del.icio.us on the internet (which is a big success, at least among geeks). On the PC side, there is EverNote, a note and web-clipping collection, processing, organizing and retrieving application (www.evernote.com).

Everyone who is using del.icio.us swears by it.

Everyone who is using EverNote swears by it.

The web and the blogsphere are full of talk about tagsonomies. Especially non-linear minds and creative types are drawn to it like bees to honey.

Yahoo has already implemented this with Yahoo MyWeb 2.0. Google with Gmail (which is also a huge success--no folders there, just tags.)

I think there's a pattern emerging here, don't you think?

This is a one-of-a-kind opportunity for DOpus to gain new customers without alienating, or even affecting, existing ones (nothing would have to be changed on the software, just added).

I can also guarantee you that DOpus would get a TON of press and talk in blogs, forums and the web, just for this capability alone.

And for being the first file manager (ever) that brings some of the benefits of tags, faceted classification and WinFS. Here. Today.

I think the publicity alone would make it worth it all by itself.

Cheers guys,

-ilias.

PS. If you are not using EverNote, you should. It's free. It's the best app installed on my PC right now. Hugely useful.

Just to clarify in my own mind... my vision of how this would actually be implemented is tainted by what I've read about plans for WinFS. It seemed like you would be able to define a Virtual Folder that you could browse just like a 'real' folder... and taking the point you've made about 'search vs. browse' into account; the view provided by such a 'vfolder' is actually the result of a search against the filesystem database to show data whose keyword/tag values match the definition of what you want the virtual folder to show you. So under the hood, such folders are really 'saved' search operations...

But going back to the music 'paradigm' (there, I've used that word once in 2005), it's got alot to offer up about the value of the functionality you're suggesting. What you're looking for is very much what many media players have allowed us to do with media files for a long time - but extends the benefit of being able to personalize the way you view your data to data types other than just 'music, video, and pictures' and does so in a way that is centric to the whole file system rather than just specific file formats...

Personally, I'm concerned that I'll just wind up spending the same sort of time organizing and maintaining keywords, tags, and virtual folders the way I currently do with normal folder structures and so forth :slight_smile:. Anyways

Hmmm. Still feels like this could be done with file collections (and probably some new commands).

Essentially there is no real difference between a tag or label and a file collection. Adding a tag to a file corresponds to membership in a file collection associated with that tag.

However, you would need some functionality to browse these collections more dynamically. This would require taking the intersections and unions of file collections to present the required results.

In your example, the paper could be added to the collections:

  • philosophy
  • Aristotle
  • ethics
  • Greece

The file collection philosophy could be way too large. The same would hold for the file collection Greece. However, the files that are in both the Aristotle and Greece collections might be what you are looking for.

These are essentially just views / queries on a database which is in my opinion far superior to the traditional hierarchical filesystem.

It would be an interesting experiment to try to define a union and intersection on file collections which return the result in some new (temporary) collection. I will try some experiments on this when I get the time to do so. I think some of this should be possible using the DOpus raw commands.

That evernote program looks interesting. I'll take a look at it. It might also be of interest to look at the Opera mail client which uses filters and views on a single database to organize mail (I believe GMail does something similar, but I have no experience with that).

-Caine.

Anyhow, what Caine just mentioned above about being able to browse data that's common between 2 collections is a good example of the sort of powerful advantage that what iLiAS is advocating has over current Opus File Collection design.

I wonder if GPSoft would be interested in shifting the whole file collection idea towards this methodology?

The rub is that alot of time could be spent overhauling this part of Opus to provide such a feature only to have Microsoft come out with WinFS relatively soon thereafter and possibly render the work somewhat redundant... or at least open the door to people making requests like 'I like Opus virtual folders and filesystem tags, but can Opus fully integrate with WinFS FS database?' :frowning:. Sounds like a possible PRE-inventing the wheel hazard (ha - I make myself laugh at least)...

pre-inventing hahaha good one steje :slight_smile:

I also agree with everything that Caine wrote. Spot on.

I believe that this is a very fertile discussion, and I think we're really getting somewhere--it looks like we are mentally on the same page.

A couple of points though, especially about what steje mentioned regarding pre-inventing and WinFS.

  1. This system is simple and quick to implement. A simple database storing the file information (filename, location, tags & possibly simple file metadata, such as filetype, author, mp3 bitrates etc.) I can't imagine this taking a long time to be coded and tested. We're talking a simple database here guys.

  2. WinFS won't be out until mid-2008 at the earliest. Plenty of time till then.

  3. When WinFS does come out, then it's just a matter of copying the DOpus tags and metadata to WinFS. A simple utility (build into DOpus or a separate download) could do the trick.

We're certainly not talking about re-(or pre-)inventing anything here. I thought about that too, actually, before I posted my original post. But then I thought how simple this feature really is to implement. We are certainly not talking about rocket science technology here :slight_smile: Just an extension of file collections, really (as Caine explained better than I ever could.)

Also, a couple of corrections: I mentioned Gmail, but Gmail is not an example of what we are talking about here. Why? Because it doesn't support label (tag) intersections. To me, the intersection capability is 99% percent of the power of this, as it is the one that allows for faceted classification.

Also, Gmail (and del.icio.us, for that matter) don't allow for hierarchical tags. They are flat, which is a no no for what we are talking about here.

Omea & Omea Pro (www.jetbrains.com) have a hierarchical category system that can also be applied to files, but again, no intersection.

The brightest example, as I wrote earlier, is definitely EverNote. What I think would be a good idea, is for me to sit down and create a screencast of how I use EverNote (which is not how everyone uses it, by the way), as it provides a "model" of how I envision a file manager to work in a similar fashion.

I have a feeling that once you see how I use EverNote , you will be sold. The challenge and the bets are on me :slight_smile:

I'll try to do that today or tomorrow.

Cheers guys,

-ilias

What you're asking for will come with Windows Vista (and is already available on OSX): it's a database filesystem. And I'm afraid there's no way to implement it at the DOpus level in an efficient manner. (plus it would only work with DOpus...)

Leo.

hey there Leo...

a) Last I checked, the jury was still out as to whether or not Vista would even ship with WinFS or not... at least the 'initial' release of Vista...

b) We know it's a database filesystem, (:-)) and I don't think it would be 'significantly' less efficient for Dopus to implement it than to have it built into the filesystem.

c) I just like conversing back and forth with iLiAS and Caine about it :slight_smile:... it sounds like you've seen this sort of system in action on OSX, what do you think of it?

Hi again guys,

a few points:

Vista will not ship with WinFS. This is confirmed. Instead, Vista will have something like virtual folders based on searches. I tend to put this capability in the same space as the DOpus file collections, since they are predefined and rather static in their definition (unless manually changed.)

WinFS will ship later (much later) and it will be an addon, not only to Vista but to Windows XP as well (and maybe to Windows 2000, although I am not sure about this one). I already have WinFS Developer Beta 1, which I have successfully installed in my WinXP machine. You really can't do anything useful with it, though, because there's (naturally) no software that understand how to take advantage of it. And because it is, of course, very incomplete yet.

OS X does not have the capability we are discussing here. I've read this in several places, and it's due to misconception or confusion regarding this functionality. OS X does not have a database file system. What OS X does have is, again, something similar to what Vista will have with virtual folders. The OS X file system is just as hierarchical (tree-based) as the Windows file system (NTFS) is.

I want to stress again that discussing database file systems such as WinFS is a much much larger scope than what this thread was originally about. WinFS, for example, will provide common file storage mechanisms to enable unification of file types across applications. So, in theory, 5-10 years from now we will not have this mess where each application writes to its own file format (binary, xml, or whatever the programmer was in the mood the day he coded it) and which other applications can't directly read, write, or otherwise use. For example, if I am a programmer and I need to look up or edit an Outlook contact in my application, I can't read/write directly to the Outlook data store--my only hope is that Outlook will have an API that I can invoke to ask it to read its own data store and supply me this information. With WinFS, in theory at least, this will all be a thing of the past.

What I want from DOpus, as you can decipher from all this, is much much simpler.

More later.

Cheers,

-ilias.

Ilias, I am curious how you use EverNote. Do you mean you use it somewhat differently besides it's automatic categorization and live-searching filter?

While they don't support the browsing that you're referring to, there are numerous desktop search tools that let you find the same files... AND without having to maintain all of this metadata. As much as I like the concept of both virtual folders and a db file system, there's no way that I'm maintaining multiple tags (and yes, I'm a heavy del.icio.us user) on the... let me check... 18,316 work documents that I have. In my case, all of those files are loaded with technical key words that a good search tool such as X1 can pick up automatically and save as a saved search. Granted, I'd rather have this functionality incorporated into my DOpus file manager but I guess there's something to be said for best of breed software.

I don't know how different Google Desktop and X1 are (I've never used either) but the Find panel in Opus gives the option of using Google Desktop if it is installed.

[quote]
nudel wrote:
I don't know how different Google Desktop and X1 are (I've never used either) but the Find panel in Opus gives the option of using Google Desktop if it is installed.[/quote]
I'm not greatly bothered, but for the record...

I just searched using Google Desktop and found 15 instances of what I was looking for. I repeated the search, using DOpus Find panel, Google search, and found 1.

Perhaps I've misunderstood what the DOpus implementation is supposed to achieve, but the Google Desktop search did not use any special options.

Good to hear from you Bernard - cheers!

The integrated Google Desktop Search in Opus will only return real files - it doesn't look in email messages, saved webpages, etc.

G'day Steje...

I follow your regular contributions with interest, as, I'm sure, all of us do.