Data Mining the Commentsphere

Posted on May 18, 2006

Micropersuasion.com has a post about a Neilsen Buzzmetics study (PDF) that analyzed comments found on blogs. Here are some of the findings from the study according to Steve Rubel.

  • The number of comments in the entire blogosphere is comparable to the number of posts in active, non-spam blogs. Therefore comments constitute up to 30% (150,000) of the daily volume of blog posts (700,000), according to BlogPulse data
  • Less than 2% of all blog comments are syndicated in feeds
  • The textual size of the commentsphere is 10 to 20% of the blogosphere
  • Use of comments is beneficial for ranking blog posts in useful ways
  • They demonstrate with data that comments are an indicator of the popularity of a weblog
  • They also do the same for controversy; high comments = high controversy
  • Steve Rubel also blogs about mining the data contained in blog comments.
    Clearly comments are undiscovered country and Nielsen BuzzMetrics is working hard to figure out how to search this critical data pool and use it to measure influence. Here here. This data is essential and it's underutilized, yet difficult to mine.
    Mining blog comments for intelligence will be a difficult and often unfruitful mission. Imagine what they will discover when they mine the millions of comments from the celebrity gossip blogs alone.