Library of Congress Has Archive of 170 Billion Tweets
Posted on January 4, 2013
The Library of Congress announced it has an archive of 170 billion tweets that goes back to 2006. The archive grows by nearly half a billion tweets a day. Unfortunately, there is no way yet for the public to access the enormous tweet database.
The Library's focus now is on addressing the significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way. These efforts are ongoing and a priority for the Library.It should not be a surprise that the Library of Congress has not yet come up with a way to share the information as even Twitter itself only allows searching of recent tweets. Twitter did recently add a feature that lets you download your Twitter archive.
Twitter is a new kind of collection for the Library of Congress but an important one to its mission. As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications and other sources routinely collected by research libraries.
The Library of Congress has the tweets, but it is in a very raw form according to a Washington Post story. Deputy Librarian of Congress Robert Dizard Jr. told the Post, "People expect fully indexed - if not online searchable - databases, and that's very difficult to apply to massive digital databases in real time."
There is also a controversy of whether to display deleted tweets. Some argue that the Library of Congress should display them in its database. How this is resolves remain to be seen. It will be sometime before the database itself is seen at all as the LOC does not have the technology yet to display it.