Jason's DTC 356: January 2011

Monday, January 31, 2011

Deduction My Dear Watson - Blog 4

Several years back while browsing a used book shop I picked us the complete adventures of Sherlock Holmes. Reading through the book I noticed that a shocking number of people living in 1880's London were poisoned. After reading through several of the stories I got to wondering about poising. I decided that I knew far to little about poisons. So after reading the reviews on a number of books about poisoning on Amazon.com I ordered one that looked fascinating. The book arrived and I read through it. It has been over a year since I read the book and only two facts from it seem to have really stuck in my head. First, cows will avoid eating marijuana plants after eating them only once. The obvious conclusion here is that cows do not enjoy being high. The second fact is that face is that koala bears subside on eucalyptus leaves which are extremely poisonous. Koala bears are able to cope with the poison but the leaves leave them high. This means that koala bears are high all the time. The reason I bring up these two completely irrelevant facts is to question how my brain stores and recalls information.

In chapter 5 Weinberger discusses the limits of trees for data storage. He does this by discussing Linnaeus's worms category. The point he makes is that by organizing data into trees all other connections between items in the trees are ignored. Clearly this is not how the human brain works. When I think of Sherlock Holmes I also think of stoned Koala bears and sober cows. I cannot think of any tree that would connect these topics. These topics all share a common idea in my head. That idea is poison. There are other ways to approach these topics in my head where they are not grouped. For instance if someone were to ask me what I know about Australia one of the first things to come to mind would be the fact that Koala bears are drug addicts. I wound not think about Sherlock Holmes or sober cows. There are also topics other then poison that would bring all three items to mind. If I were having a discussion about drugs with someone all three topics would come to mind. The cow and the Koala for the obvious reason that both directly include drugs. Sherlock Holmes would be brought to mind because having read the books I know that he is an occasional user of Cocaine. Linking one word to many things is called tagging by Weinberger. He attributes it's current wide spread usage to Delicious.com, a website that allows users to build lists of websites and then access them via tags that were assigned to each site as it was added to the list.

Weinberger talks about tagging as the best way to keep track of the 3rd order of organization. While I agree that tagging is the best way we currently have to sort through the massive amounts of data stored digitally I still see a number of problems with tagging. Weinberger uses flickr as an example of tagging in action. People on flickr tag there images with several key words when they upload them. When someone searches flickr for one of those key words the image and all others that sure that tag will be returned. This allows for a limitless number of categories. Here is the problem. If I were to upload a picture of a Koala bear eating I would naturally tag it with OMG cute, Koala Bear, Tree, Poison, Sherlock Holmes. That last one is not going to make sense to anyone but me. Now when someone searches for Sherlock Holmes they will wonder at the random picture of a Koala. This does not seem like a big deal, I am one person and it is only one categorized image. However everyone makes connections like this and the picture is not necessarily categorized. If you are me the image is tagged perfectly. Only when your not me does the tag become wrong. To combat this sort of problem tagging tends to be very generic and describe what is in the image with single words. For example a picture of a doctor operating would be tagger with tags like: Doctor, Operation, Incision, Intestines. Anyone trying to find general pictures of a doctor would receive this image back as part of their search. However if someone is looking for a specific image of their doctor they will have a very hard time finding it. Because tags tend to be more generic to avoid confusion like a picture of a Koala tagged with Sherlock Holmes it makes it very very hard to find a specific image.

In addition to the above problem tags allow for little inter connectivity between tags. A tag is like a light switch in that it is either on or off. There are do degrees of variance. If I search flickr for cats it will look at images to see if they have a cat tag. If it finds as picture with a cat tag it will return it to me otherwise it will ignore the image. There is no way to indicate that an image without a cat tag is somewhat related to my search. For instance what if the picture contains a saucer of milk. Clearly this image is in someways related to cats but the current tag system would not know this and would not return this image when I searched for cats. Our brains are able to do this. When I think of stoned Koala bears ducks come to mind. Not because ducks are habitual drug users but instead because they were mentioned in the same poisons book. There is a small bean called the castor bean. It is extremely poisonous. You may be more familiar with its common poison name. It is used to make the poison ricin. Two castor beans are enough to kill a human. Twelve will kill a horse. It takes fifty castor beans to kill a duck. For whatever reason ducks are very resistant to ricin. Both the Koala bear and the duck are facts that pertain to animals eating poison. However it would be very difficult to show this connection through tags.

So obviously tags work better then trees in the digital world. This is because that categories can be created both when data is added and when searched. With a tree all categories have to be picked before the tree is ever laid out. Tags are not perfect. They make it very difficult to look for specific things due to their generic nature and fail to show degrees of connection that our brain uses every day. There must be a way to allow for these connects but if there is I am not aware of it.

On a final note everyone should read a little Sherlock Holmes. Preferably not Hounds of the Baskervilles as it is one of the worst Sherlock Holmes stories.

Wednesday, January 26, 2011

On the organization of organization: Blog 3

Since I failed to discuss something in my last blog that I am in charge of organizing I will start out this blog by doing so. I work for the information systems department for schweitzer engineering laboratories. My title there is intern desktop system administrator. Part of my duties in this position is to create and maintain a set of tools that are used to install software on all computers purchased by SEL. One of the tools used is a script that installs all of the drivers and software that cannot be stored in an computer image. A computer image is a complete backup of everything on a computer. By setting up a computer with the software that all computers at SEL need to have and then making an image of it allows us to apply that image to other computers and thus save the time of installing every single piece of software one every computer. However not all software can be moved about in this fashion and all the drivers that relate to specific hard ware must also be left out of the image.
For this reason I maintain a 500 line script that installs all of the left overs that could not be installed on the image. In order to ensure that each computer gets it's proper drivers it was necessary to devise a system to store the drivers for each computer model. In all the script supports some 20 different models of computer. Originally I decided the easiest way to store all the drivers for these computers was to create a folder named hardware with 20 sub folders, one for each computer model. Inside these subfolders were stored the drives for the model the folder was named after. The script would look up the computer model for the computer it was being ran on and then use a case statement to find the proper folder and run the right commands to install the drivers in this folder. A case statement can be thought of as a list of possibilities. After looking up the model of the computer it was being ran on the script would compare it to each of the 20 cases that made up the case statement. If it matched any of the cases it would run the lines of code denoted within that case and in doing so install the drivers for that particular model.
There were two problems with ordering the drivers in this fashion. The first is a matter of storage. While in general computer hard drives are dropping in price networked storage can still be expensive. This has to do with the need for data redundancy and the cost of maintaining servers. In addition to this the location I was storing the drivers in is replicated to several other servers world wide. This means that I needed to keep my hardware folder as small as possible so that it cost less and took less time to replicate to other locations. Many of the computer models we use at SEL share some drivers. My system did not allow for this. If two computer models shared a drive then I had to store that driver in the folders for both models. This is clearly a waste of space. The second problem with this organization scheme is that it required a case statement in the script. This meant that every computer model had to be listed and whenever we began using a new model of computer the script had to be edited to include it. Out of the 500 lines that made up the script perhaps 300 of them were taken up for this one case statement.
Recently I had cause to switch the scripting language used for the script. I decided to improve the system during this process. After careful consideration I decided to lump the computer models by series. Each model of Dell Business laptops belongs to a series. For instance model E4200 belongs to the 'E' series. It is common for computers within a series to share some drivers. So I changed the file structure so that in the hardware folder there were several series folders. They were named E_Series, D_Series, S_series ect. Within each series folder was a folder for each model in that series and a common folder. Most of the driver for each model were still stored in the folder named after that model but any drivers that were shared by the entire series were stored in the common folder. This allowed me to significantly reduce the storage space required for all the folders. It also allowed me to rewrite the script so that it pulled the model number, looked in the folder for that model and installed any drivers and then installed any driver in the common folder. If we begin using a new computer model we need only create the proper folder for it and the script will find that folder and install its contents. By doing this I shortened the script by more then 200 lines of code. While the reorganization was not strictly necessary for this new script to work it made coding the script a lot easier.
So there you have it, my experience with organizing and reorganizing. I would say I defiantly tend to be a lumper when it comes to organization. I would also say that this is a good example to contradict the book with as it is one instance when it was important to have just one of a digital object.

So on the book and chapter 4. I found this chapter interesting because I have never realized the extent to which human categorize. When Weinberger talks about the power of meta data to give us context for a statement I was amazed. I never before realized how much definitions are trees. The entirety of my understanding of the world is one enormous tree. I don't see it because I have been living within it my entire life. I am going to go ahead and keep up with the coding references here because it seems to mix well with this topic. Most programming languages require that you declare variables. A variable can be thought of as container with a name. When declaring a variable you are telling the computer 3 things. First you have to tell the computer to create something and refer to it by x name. X being whatever name you assign to the variable. Second you have to tell the computer what type of information will fill that variable. This is so the computer knows how to treat the variable. For instance I might tell the computer that the variable is an integer. The computer now knows that the variable will hold some type of number and so to treat its contents like a number. If I tell it to add 12 to the variable it will know I am referring to the value of whatever number is in the variable. Lastly you must tell the computer what is stored number should be stored inside this variable. This does not need to be done right away but must be done before the variable is used in any computations.
What I really want to do here is focus on the second part. When I tell the computer that the variable is an integer the computer automatically applies all the rules it knows to that variable. It can be used for math. It cannot be used to sort alphabetically. It can be used to store numbers. It cannot be used to store letters. I never before realized that this is how my brain works. If some is a bird it will have wings. It won't be able to breath under water. It wont have fur.
Many of the more modern programming languages no longer require a programmer to explicitly state what type of data will be store in a variable. This is largely because computers are fast enough to run tests against the variable to determine what type of data it holds and then treat it accordingly. I fell like this has a strong correlation to the Colon Correlation mentioned in the book. We are headed towards a system that allows anything be anything. The colon correlation system allows for almost limitless books. Only when we want some specific do we have to assign restrictions on the system. In the digital world things will be ambiguous until they are giving definition by our looking at them.

Tuesday, January 25, 2011

The Lack Of Time

Introduction:
The idea of breaking away from traditional sorting and storage is an interesting one. In Everything Is Miscellaneous by David Weinberger he leads us into a discussion about it by first explaining the old system of organization and storage. This is storage in the physical. His example of this is a staples store. He talks about how staples use a special planning store to attempt to overcome the traditional limitations of the physical. This seems like and odd way to approach the subject. Most people never thing about how a store is put together beyond the obvious practices of storing the things we want far from the entrance. So by using the example of a staples store to lead into a discussion of the storage of digital media seems a bit like telling someone how scoring works in a baseball game as a preface to explain scoring in a soccer game.

That being said, the intro was interesting to read. I had no idea that there were stores out there like staples who do not try to trick their customers into buying more then they intended. This is a refreshing idea and I can see how it my pay off in the long run. When I go to a store like WalMart I almost always come out with more then I intended. I know this and often avoid WalMart for this reason. Keeping me out of their store is not was walmart intended when they started this system but it has been the result. The one part of the introduction I really did like was the idea that everything but the things you are looking for in a store a just things that are in your way. I have never thought about it this way and found doing so to be an enjoyable exercise.

Chapter1:
I find it difficult not to rally against the view points of the author in this chapter.He discusses how we need to change the way we interact with large amounts of information. He talks about how those interacting with data, for instance the iTunes store have more choices then are available to those who shop physically. It is hard for me to think about data on a computer as random. I have worked with computes for years and know that computers a machines of order. Almost everywhere large amounts of data are stored you will find a database. These databases are what allow consumes, such as those on Itunes, to find what they are looking for. These databases are the essence of order. They make it possible to store vast and almost unimaginable amounts of data. Itunes offers is users a search feature that lets them search its music database. With that being said data basses work very well up to a point. Once a database becomes two large or sloppy it quickly becomes useless. I will use flckr as an example of what I mean. The author mentions this site and states that it has around a million photos uploaded every day. So lets say I was browsing flikr last week. I found an image that I like. It was two dogs playing in the snow. I now want to find that image again to show to a friend. I search for "two dogs play in the snow" on flikr and get 10,000 picture results. Because all the images are stored in a database it is easy for flickr to pull up all the pictures relating to my search terms. However the picture I want was tagged the individual who up loaded it and as such the dogs names were included instead of the fact that they were dogs. They labeled the picture "Tiny and Big Jim play in the snow". Searching for dogs playing in the snow will never return this picture. It is for all practical purposes lost to you. This is similar to the problem the Bettman photo archive has. When there is too much information finding one piece information can be very difficult.

Chapter 2:
I enjoyed chapter two. The idea of the alphabet as an arbitrary order is very interesting. Large data stores use a similar system for ordering and such. They use what is called a primary key. The primary key is the most important part of data storage in our modern age. It is a number assigned to each database entry that is completely unique to that entry. In this way it is able to keep all the data entries strait and separate from one another. The public would almost never see an entries primary key because they don't need to. It is purely for exact organization of data on the computers. He goes on to talk about natural joints. This does not make a lot of sense to me. I don't really understand what they have to do with data storage. It was an interesting chapter, I really liked the explanation of the how the periodic chart of elements came to be. I really did not understand what impact this had on data storage. Perhaps it was to drive home the point that the same information can be stored more then once in our modern age and still make perfect sense due to context. The entire point of the afore mentioned public primary keys is to allow for the same data to be entered more then once. Because each piece of information has its own primary key you are able to keep the two data entries distinct from one another.

Monday, January 24, 2011

Smart Phones and Web Applications

The web is growing is getting larger at a very rapid pace while previously unused data becomes the cornerstone for the new web. This is the direct result of Smartphones. Smart phones collect worlds of data that are too tedious to be manually captured by a human. Instead all we have to do is allow our phones to post this information. Most of this new functionally comes in the form of location. These changes have been phenomenal. I no longer take an atlas on road trips or even bother to google my stops ahead of time. My phone provides instant data on local businesses. It is directions at my finger tips. This is the largest of the changes included in the expanded web 2.0 article. The internet is not longer a place for data to be passed from person to person. Now it passes from person to phone to phone to person. Our phones have become out web translators. They take what would be a jumble of meaningless data to a human and turn it into simple easy to use forms that are a great help.

On a different note my favorite web application of late is called isle of tune (www.isleoftune.com). It is fascinating that something as abstract as this amazes me. The creator of this app has connected two things that have not connection in real life, traffic and music. This site has entertained me for hours. I have no sense of music so even after spending hours on the site my songs still sounded terrible. It was the one site that I could not help showing it to everyone I encountered. It is musical and amazing. This sort of application is what I love about the web. t is where creativity meets skill. I really would like to code something like this someday. I really think everyone should have a look at it because it is very hard to describe. Basically it allows a used to create a little town that plays music as cars drive around it. There is no point to the site, other then to create, which is amazing. The webs ability to support pure creativity is why it is my favorite thing.