Monday, January 31, 2011

Deduction My Dear Watson - Blog 4

Several years back while browsing a used book shop I picked us the complete adventures of Sherlock Holmes. Reading through the book I noticed that a shocking number of people living in 1880's London were poisoned. After reading through several of the stories I got to wondering about poising. I decided that I knew far to little about poisons. So after reading the reviews on a number of books about poisoning on Amazon.com I ordered one that looked fascinating. The book arrived and I read through it. It has been over a year since I read the book and only two facts from it seem to have really stuck in my head. First, cows will avoid eating marijuana plants after eating them only once. The obvious conclusion here is that cows do not enjoy being high. The second fact is that face is that koala bears subside on eucalyptus leaves which are extremely poisonous. Koala bears are able to cope with the poison but the leaves leave them high. This means that koala bears are high all the time. The reason I bring up these two completely irrelevant facts is to question how my brain stores and recalls information.

In chapter 5 Weinberger discusses the limits of trees for data storage. He does this by discussing Linnaeus's worms category. The point he makes is that by organizing data into trees all other connections between items in the trees are ignored. Clearly this is not how the human brain works. When I think of Sherlock Holmes I also think of stoned Koala bears and sober cows. I cannot think of any tree that would connect these topics. These topics all share a common idea in my head. That idea is poison. There are other ways to approach these topics in my head where they are not grouped. For instance if someone were to ask me what I know about Australia one of the first things to come to mind would be the fact that Koala bears are drug addicts. I wound not think about Sherlock Holmes or sober cows. There are also topics other then poison that would bring all three items to mind. If I were having a discussion about drugs with someone all three topics would come to mind. The cow and the Koala for the obvious reason that both directly include drugs. Sherlock Holmes would be brought to mind because having read the books I know that he is an occasional user of Cocaine. Linking one word to many things is called tagging by Weinberger. He attributes it's current wide spread usage to Delicious.com, a website that allows users to build lists of websites and then access them via tags that were assigned to each site as it was added to the list.

Weinberger talks about tagging as the best way to keep track of the 3rd order of organization. While I agree that tagging is the best way we currently have to sort through the massive amounts of data stored digitally I still see a number of problems with tagging. Weinberger uses flickr as an example of tagging in action. People on flickr tag there images with several key words when they upload them. When someone searches flickr for one of those key words the image and all others that sure that tag will be returned. This allows for a limitless number of categories. Here is the problem. If I were to upload a picture of a Koala bear eating I would naturally tag it with OMG cute, Koala Bear, Tree, Poison, Sherlock Holmes. That last one is not going to make sense to anyone but me. Now when someone searches for Sherlock Holmes they will wonder at the random picture of a Koala. This does not seem like a big deal, I am one person and it is only one categorized image. However everyone makes connections like this and the picture is not necessarily categorized. If you are me the image is tagged perfectly. Only when your not me does the tag become wrong. To combat this sort of problem tagging tends to be very generic and describe what is in the image with single words. For example a picture of a doctor operating would be tagger with tags like: Doctor, Operation, Incision, Intestines. Anyone trying to find general pictures of a doctor would receive this image back as part of their search. However if someone is looking for a specific image of their doctor they will have a very hard time finding it. Because tags tend to be more generic to avoid confusion like a picture of a Koala tagged with Sherlock Holmes it makes it very very hard to find a specific image.

In addition to the above problem tags allow for little inter connectivity between tags. A tag is like a light switch in that it is either on or off. There are do degrees of variance. If I search flickr for cats it will look at images to see if they have a cat tag. If it finds as picture with a cat tag it will return it to me otherwise it will ignore the image. There is no way to indicate that an image without a cat tag is somewhat related to my search. For instance what if the picture contains a saucer of milk. Clearly this image is in someways related to cats but the current tag system would not know this and would not return this image when I searched for cats. Our brains are able to do this. When I think of stoned Koala bears ducks come to mind. Not because ducks are habitual drug users but instead because they were mentioned in the same poisons book. There is a small bean called the castor bean. It is extremely poisonous. You may be more familiar with its common poison name. It is used to make the poison ricin. Two castor beans are enough to kill a human. Twelve will kill a horse. It takes fifty castor beans to kill a duck. For whatever reason ducks are very resistant to ricin. Both the Koala bear and the duck are facts that pertain to animals eating poison. However it would be very difficult to show this connection through tags.

So obviously tags work better then trees in the digital world. This is because that categories can be created both when data is added and when searched. With a tree all categories have to be picked before the tree is ever laid out. Tags are not perfect. They make it very difficult to look for specific things due to their generic nature and fail to show degrees of connection that our brain uses every day. There must be a way to allow for these connects but if there is I am not aware of it.

On a final note everyone should read a little Sherlock Holmes. Preferably not Hounds of the Baskervilles as it is one of the worst Sherlock Holmes stories.

2 comments:

  1. This is a really great post in terms of summary and examples. I quite like the stories you weave throughout, as it makes the post really interesting and also illustrates to me that you're able to take the reading and apply it to real world examples. Great job. The one thing though that hurt you here is you didn't quite answer the prompt, at least insofar as you didn't make any particular references to the O'Reilly articles.

    ReplyDelete
  2. I agree. Tagging is important but it is only as reliable as the user tagging the content. As you pointed out, things can be miss-tagged or tagged only for a specific reason that may not make sense to the rest of us. I guess what I am getting at is that with Web 2.0 we have to have a sense of trust and honesty with each other.

    ReplyDelete