September 20, 2010 – We read with interest “A Revised Taxonomy of Social Networking Data.”  In the last month, the arguments in the article have provided a framework for thinking about the amount of the information flowing through Facebook, Twitter, Tumblr, and other social media conduits.

Let’s revisit the elements of the social media taxonomy. The principal categories are, and we quote:

“Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number. Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on. Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does. Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place. Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on. Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.”

One question we discussed recently is, “Who controls the metadata?”

In a traditional indexing operation, the database publisher manages the terms, assigns them, and controls access. Usage data is provided by some vendors, but often that information is separate from the controlled terms.

In social media, there is a shift from the traditional term list to the usage information. The result is that social data evoke a different way to think about tagging, its utility, and its implications. The content takes a back seat to information about users.

“Who controls the metadata for social media?” Certainly not the user. The categories seem to emerge from user actions. The output is a new set of data, and the “metadata” angle is essentially outputs from traditional data mining and clustering, among other processes.

The second question, “Is this type of tagging metadata?”

On the surface, the answer is “Yes.” As we discussed this issue, three observations surfaced:

First, in classical indexing the rules are known and can be explained. A term from a controlled list is assigned to an information object. Period. Clean. Easy to understand. The social media tagging is a different approach. The categories are emergent.

Second, metatagging in a commercial database operation is intentional. In the social media space, emergent processes operate. We don’t think this difference has been fully explored.

Third, traditional tagging operations impart consistency. Testing may be easier to benchmark. In the social media taxonomic system, consistency may not be a key factor. One may do less testing and more exploration or discovery.

Our view is that more work must be done with regard to a social media taxonomy. The difference between traditional indexing and social media indexing seems significant and not well understood. Social media tagging seems to be about business. Traditional indexing is about findability.

Users may be surprised about the intent of social media tagging.

Ken Toth