For every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis. Capitol Words lets you see what are the most popular words spoken by lawmakers on the House and Senate floor.
The contents of the Congressional Record are downloaded daily from the website of the Government Printing Office. The GPO distributes the Congressional Record in ZIP files containing the contents of the record in plain-text format.
Each text file is parsed and turned into an XML document, with things like the title and speaker marked up. The contents of each file are then split up into words and phrases — from one word to five.
The resulting data is saved to a search engine.
Capitol Words has data from 1996 to the present.
Words/phrases over time
The graphs on the Capitol Words site that show the occurrences of words and phrases over time do so by using the relative frequency of the word or phrase. That is, the number of times the given word or phrase occurred divided by the total number of words or phrases of that size for that time period, multiplied by 100. This gives a better idea of how popular a term really was because it takes into account how much action there was in Congress in that time period.
To calculate the top words and phrases for each entity, Capitol Words uses a weighted formula similar to term frequency–inverse document frequency. This formula gives a higher weight to words and phrases used more frequently by the given entity than by others.
The Capitol Words site runs on a built-in API, which is available to developers. To begin using the API, register for an API key at Sunlight Data Services.
API Documentation is available at http:/capitolwords.org/api/.