CloudKick Monitoring of Load-Balanced HTTP Services

Posted by Matt Farmer on August 24, 2012 · 3 mins read

So, recently we made the jump at OpenStudy from a single monolithic process to a multiple-process architecture for the main site. When engineering this, Antonio essentially split out our Actors (which manage much of our business logic) and our Web frontend components (Comets for realtime push, session information, etc) into two separate entities designed to run with RabbitMQ in the middle. The good part about this is it means that we can scale out our web load horizontally, and we’ve been serving the site using two web processes since we made this change, which has been great.
Naturally, however, any good change introduces some bugs. We have had quite a few that proved troublesome, but even more so by the fact that we didn’t really have a good way to monitor individual web processes from CloudKick. There didn’t seem to be a published solution for monitoring individual nodes in a load-balanced array without pulling down CloudKick IP addresses and allowing those through your firewall, which we hated because it would require our Firewall rules to change regularly and in an automated fashion. So, we were stuck with an external monitor on the public URL which, in the event one process was failing while the others were fine, would send many failure and recovery emails as HAProxy round-robined between servers as each of the monitors subsequently hit the server. Not ideal.

So, about midnight on Tuesday I had a stroke of genius. CloudKick supports plugins. Maybe I could look into those and see how hard they are to build, right? Turns out, they’re really easy and with some bash scripting magic, I was able to come up with a CloudKick plugin that will monitor a local service on whatever machine it is running on using curl. This is the latest version of what resulted from that thought: