Tuesday, December 20, 2005

Anjo Anjewierden: What is a Topic? 

Because it is the holliday season, Anjo Anjewierden asks what is the topic ?. Now anybody who has ever attended a party (which are seasonally superabundant at the moment) and casually joins in a conversation, knows that it may take a while to actually find out. One reason is ofcourse that some conversations (especially at parties) serve no other purpose than giving the participants the idea that they belong to the same group, or, like the more flirtatious ones, that the participant(s) want to raise the level of intimacy. Thus, if you are an outsider to the group, you may come to the conclusion people are discussing work, when what they really do is building on the shared experience they have, to stress their commonality and shared interest. To some extend that seems to be the case for blogs as well, especially if they address a community that is somewhat interacting. However at least I would already be quite happy when we can detect the more superficial topic automatically.

Lilia equates the topic with the tag you would be putting on document like a blog. She has a point, but I would think that a tag's primary purpose is to make it possible to find something back by generating enough and sufficiently general associations (mirroring the precission recall dichotomy in information retrieval). On the other hand I think that, when asked the question "what are you talking about" I think they will try to give characterisation that is sufficiently precise in a given context. In a sense this may explain why Anjo is succesful with a simple information retrieval measure like TF/IDF, which is after all designed to find terms that are distinguished in a document compared to other documents (the context !). Without having read the full conversation a word like Blogwalk, Sigmund or Skype, may be quite enough as a characterisation. On the other hand Lilia, who has been engaged in the conversations, has more of a context. She will therefore subconsciously remember similar dicussions and change her context correspondingly.

Now what do I mean by similar discussions? I wish I knew exacty, but lets take the discussion on skype and pressence as an example. They both have high frequency and high TF/IDF in this discussion. In falling measure of TF/IDF score Anjo finds presence, Skype, communication, IM (= Instant Messaging, I guess), communication tool. Knowing Lilia, I know that she knows, that skype is a communication tool that supports instant messaging and presence, and that presence is the capability to tell whether you are avalailable for communication (so there you have ontological relations). Now what does it mean that Lilia remembers similar discussions. I don't know exactly ofcourse, because I don't really know everything she has read, and moreover the human brain is subtle. But let us suppose that using her ontological knowledge she will first "score" a hit for communication tool for every mention of skype, and score a "bit of a hit" for presence and quite a bit for communication (because this subject is dear to her hart). This puts communication tool, presence and Instant Messaging higher up the list. Again knowing Lilia, I know that she read blogged and talked about these subjects. Thus, consiously or subconsciously, I think she subconsciously changed her context (if you want to think in IR terms, changed the "document collection in which to compute TF/IDF scores) and asked herself what is it that characterised this dicussion in this more specialised context. In this context (and her social and blog circle) skype, IM and presence are not so characteristic anymore, so my guess (not having read the discussion) is that she comes up with a characterisation like

presence with skype

or likely more specific characterisations such as

How do you switch on presence in skype


presence in skype is really great/awful/mediocre


skype now does presence too !



It is thus not so surprising that she wants to have a Sigmund type cooccurence analysis of the conversation, because cooccurrence tends to emphasize the relations between terms rather than just the terms themselves, although being a statistical method, it cannot really see what kind of semantic relations may underly observed cooccurences.

Now I hope I know Lilia well enough, that my belief that she does not mind me blogging about what she thinks, is true :-).

This page is powered by Blogger. Isn't yours?

© Copyright 2004-2006 Rogier Brussee.
These are my personal views and do not necessarily reflect those of my employer.