Taxonomies do not belong to one science or interest. They can be applied anywhere from a grocery store to plant anatomy to hip hop music?

A music fan (err geek) decided to compare the vocabulary of the literary world’s giant, Shakespeare, to those of famous hip hop artists. Shakespeare’s vocabulary across his entire corpus is said to be 28,829 words, and suggests that he knew over 100,000 words. This is possibly the largest vocabulary ever.

The experiment began with each artist’s first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake. 35,000 words covers 3-5 studio albums and EPs. They included mixtapes if the artist was just short of the 35,000 words. There were quite a few rappers that didn’t have enough official material to be included. As a benchmark, data points for Shakespeare and Herman Melville were included using the same approach.

They used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, regardless of tense, use or plurality.



There were 85 artists in the data set. Aesop Rock claimed the top spot well-above every artist in the data set. Wu-Tang Clan claimed spots 2, 6, 7, 9, 20, and 23. At the other end of the spectrum at spot 85 was DMX. You can view the interactive inforgraphic to learn more about those performers that fell in between these marks.

It is interesting that some of the biggest names in hip hop were in the bottom 20%. It is also important to point out that the breadth and depth of a performers vocabulary does not directly align with their quality as a performer or songwriter. This is just data and as with all data there are stories to tell.

We know the power of a taxonomy in achieving findability in data. Even in topics of little to no interest, there are applications for the science of taxonomy. The important thing to remember is a strong standards-based taxonomy is one with true integrity. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.