When I was in school I took an Information Architecture class that required a readings journal. Some of those entries deserve revision.
I was, at the time, especially in the difference, similarities and interrelationships between classification and categorization. What follows began life as Studer: Classification v. Categorization. The first version was written November, 2001.
Studer, P.A. (1977). Classification as a general systems construct. In B.M. Fry & C.A. Shepherd (Comp.) Information management in the 1980's: Proceedings of the [40th] ASIS Annual Meeting, Chicago, Illinois, September 26-October 1, 1977 (pp. 67, C6-C14, A1-A9). White Plains, NY: Knowledge Industry for American Society for Information Science.
The Studer article suggests there is a lack of consistency in the literature in the use of the terms classification and categorization. Studer uses the terms carelessly, especially when quoting: while he uses the term classification in his own text the quoted text uses category.
Studer makes it sound like the process of creating classifications is a step following the creation or identification of categories. This conflicts with my interpretations.
In my view classification is an artificial (synthetic, non-fundamental) process by which we organize things for presentation or later access. It involves the arbitrary creation of a group of classes which have explicit definitions and may be arranged in a hierarchy. In other words a class is strictly defined and once inhabited the inhabitants can be enumerated.
Categorization, on the other hand is a natural process in the sense that humans do it as part of their cognitive fundament. It is, like Studer reports, an act of simplification to make apprehension and comprehension of the environment more efficient. Categories spring up out of necessity and because they are designed to replace the details of definition are themselves resistant to definition. When provided with a list of stuff we are able to categorize the stuff, but when asked to list the full contents of a category we cannot.
So to put it more succinctly:
- a class is a defined grouping of entities in which the members fulfill the definition of the class and can be listed.
- a category is a cognitive label applied to a non-enumerable grouping of entities wherein membership is determined by typicality amongst the members and not some overarching definition.
This is important to me, in part, because I'm playing around with trying to determine if computers can ever be actually intelligent or must always fake it. I vote for the latter because computers, thus far, cannot categorize.
The ability to categorize may be the basis for intelligence (On Intelligence, by Jeff Hawkins, presents some data to support this, as well as some assertions that may blow my "thus far" out of the water, given time). On the fly categorization allows us to place data in an informational context. Once in that matrix we can do what amounts to an endless recursive dialectic wherein each new synthesis becomes thesis.
Computers can presumably replicate this process but if they do, it is imitation. Their distinctions must be made by definition, by classification, not categorization. They can be made to appear to do categorization but the alternate representations they provide are rules (definition) based. Until recently the most promising research in creating seemingly intelligent machines has used what can be called a brute force approach: supply the computer with as much information as possible, related in as many ways as possible. This is the method that IBM used to get Deep Blue to become a chess champion and is one of the keys to the Semantic Web.
If we want to create truly intelligent machines we must determine how categorization works. I wonder, though, why we want intelligent machines. What do we gain from that? Don't we instead want machines that are tools to augment our own intelligence? If that's the case, then we are already have the understandings to make progress: we simply need to improve on what we have.