January 2014 Update on GAODP

Posted by Matt Farmer on January 25, 2014 · 5 mins read

Hallo folks! It’s been a minute since I updated you guys on what’s going on with the Georgia Open Data Project.

For those of you who don’t know, last year I started working on something I’ve taken to calling the Georgia General Assembly API (or, gg-api for short). The gga-api is designed to provide information on the activities of the Georgia General Assembly in a RESTful JSON format, which is much nicer than the WSDL that General Assembly provides from their systems, and to provide some higher order bells and whistles that don’t already come built into the General Assembly’s API.

My hope in starting this project was that by breaking down the barriers to entry associated with getting data from the Georgia General Assembly’s activities, we could see a renaissance of sorts in Georgia, where people, like myself, who have no interest in following political events throughout the year, could get access to information about their state representatives and senators when it matters most: election season.

We’ve made good progress towards breaking those barriers down. The gga-api is able to serve up most of the data that’s available directly from the WSDL API. However we ran into issues while attempting to host the box on-the-cheap on DigitalOcean’s $5/mo plan. These issues pretty much halted further development. In addition to the performance problems, and the number of network outage notices I’ve received from them in the past several months, I was inclined, instead, to keep my money at Linode and move the service back to my personal server.

It was about high time to blow away my personal server again anyway, so to facilitate this process I started building out chef cookbooks to automate the deployment. I got held up trying to get my Jenkins recipes working so that we would launch with continuous deployment from GitHub. I was unsuccessful in this endeavor and, to boot, we missed our January 15th goal of being fully self-sufficient and updating nightly.

Last night I made the decision to stop waiting on Jenkins to be ready. I backed up the few things left on jethro that I wanted, and then proceeded to take a canon to its hard drives. Barring some minor issues with the Linode stack scripts, everything went according to plan. The Anchor Tab Staging site was back online with a fresh database in around 45 minutes, and a clean copy of the gga-api came up shortly thereafter.

As of now, the new copy of the gga-api has the same data on it that it had on DigitalOcean. The ID’s of objects have changed, because it is a fresh import, but we are now up to having details legislative information for the past two legislative sessions, which is about where we were sitting before. All that’s left is to change the DNS settings so the domain points to the correct location.

Looking ahead, we have a few items that we need to cover soon-like:

  • We need to finish the imports of details legislative information. I’m going to be kicking off these jobs on jethro over the next several days. I hope to have all the detailed data that the GGA themselves have before the end of the weekend.
  • We need to improve the level of detail we have. I’m pretty sure we are still missing some fields that you can find on the actual GGA website.
  • Hourly/Nightly/weekly imports for new data. These are my next big ticket item. Particularly figuring out what things change how often and setting up the import triggers appropriately. Also, figuring out how to not accidentally overload the GGA servers with my requests would be nice.
  • Continuous deployment. When we push new code to our master copy, the new code should just appear on the server.
  • Correctness. This is a problem I haven’t quite figured out how to tackle: How do we ensure our data is correct? I’m afraid the only option is for people to start using it and report issues when they find them. If you have any other creative solutions, please shout them out.

As it stands, the gga-api is still in alpha/beta. The data should be taken with a grain of salt and, as always, verified before acting on. But the migration to my personal box at Linode means that we have more bang for the same amount of buck.

Please leave me some comment love on what you think about the project, questions, or concerns. If you’ve found something wrong with the data we’re serving up, report an issue on our GitHub project.

Until next time folks. ?