Thinking About Tests - Matt Farmer

I’ve been thinking a lot about tests lately. The automated, software development kind, not the school kind. I’ve been thinking a lot about tests because I’ve realized that my thinking about them has been broken. Specifically, around how to test Lift web applications. Something that I’ve had to come to terms with is that my schooling and experience up until a few years ago did a horrible job of educating me regarding when tests are needed and why.

I remember being taught a lot of text book mantras when I was in school regarding development. Things such as, “testing will prove your software works” and “testing will help you know if you broke something else.” For a long time, I regarded the former mantra with a bit of incredulity (“I test it when I’m writing it, why am I going to the effort of writing more code to do the thing I’ve already done?”) and the latter with an air of hubris (“If I build my code correctly, the scope of any change will be limited and I’ll catch problems myself.”)

For what it’s worth, I think the Test Driven Development evangelists got it wrong too. At least some of them did. Testing as a matter of doctrine without understanding the whys and wherefores will tend toward the software engineer’s version of hoarding if done without care. Tests keep adding up and adding up, some may not be written properly, and changing any minor detail in the application causes an increasing number of tests to fail over time – at least some of which are failures that are a product of how a particular test is being run rather than what is being tested. I understand now that there are lots of good tools to help avoid that, but there were (and are) enough examples of people not using those tools that my general reaction to the practice was summed up in “Why?”

Though I haven’t sipped of the TDD kool-aid, I do take testing a lot more seriously than I once did. So, what changed?

Well, at work we’re engineering at a much larger scale than anything I’ve ever been a part of before. It’s literally impossible for one person to have the entire scope of the system in their head at any given time. It doesn’t all fit. As a result of this, it’s very easy for someone on our team to make a change without knowing the downstream implications before they make the change. Work in that kind of environment without sufficient testing, intentionally or otherwise, and you’re in for a world of hurt as a software engineer. All of a sudden your change to a line of XML is causing buttons to go missing in the running application and you have no idea why.

So, maybe the answer is to just write as many tests as you can and by doing so you’ll win, right? If only it were so.

I’ve also learned, first with Anchor Tab and again later on, that if you choose to test incorrectly, then you’re also in for a world of hurt. Take Anchor Tab as an example. I decided to use Selenium to test everything when I started adding tests to the codebase. I thought I was incredibly clever, because, after all, what I cared about was that the user could do what they wanted to do.

What I’ve found to be true as a result of that experience is that Selenium isn’t really a great API to do the same task as a unit test. The inherent contract it imposes of acting like a user is simultaneously its biggest strength and the blade that you will cut yourself on if you try to bend that contract to your will in unintended ways. Selenium, and technologies like it, is fantastic and certainly has a role in a larger testing strategy, but it isn’t the larger testing strategy – even for a web application. It doesn’t replace unit or integration testing, as I thought, but supplements it in different ways.

As a result of these realizations and ponderings, I’ve realized my Selenium misadventures stemmed from what College Student Matt learned about testing. I would teach College Student Matt differently today.

Instead of teaching things like “tests prove your app works” and so on, I would do my best to drill into his head the idea that tests are tools for answering a question. They are not something that magically makes your code harder, better, faster, or stronger. A test is merely a piece of code that answers a question about your code and you should probably know what that question is before writing your test. Also, the question itself affects the kind of test you need to write and making the right decision on the type of test is important for minimizing future work on the test while simultaneously maximizing the usefulness of the test. At the end of the day, you’re answering this question for someone else and and making that person’s life as easy as possible should be a priority.

So, for example:

If you are writing a method that parses an tag in HTML into a data structure, one question you’ll want to ask is “Will my method behave sensibly for all kinds of input it could see?” and you’ll want to write a unit test on that method, where the method itself is completely isolated from any other component of the system.
If you’ve finished writing and wiring up a series of components that all work together, the question you’ll want to ask is “Is the entire series of components behaving sensibly today?” and you’ll want to write an integration test on that series of components where everything is As Real As Possible™.
If you’ve finished building out a new series of screens in your web application, the question you’ll want to ask is “Can a user successfully accomplish their goal in the web application?” and you’ll want to write an acceptance test using a tool like Selenium or PhantomJS.

The unit tests and integration tests should try to think of all reasonable scenarios and ensure that things function correctly, the former focusing on an individual method and the latter on a series of components. (Do note I wrote all reasonable scenarios, not all possible scenarios.) Meanwhile, acceptance tests should focus on determining whether or not a user interacting with an application can accomplish a set of goals. What’s the difference, you ask?

Well, I hold the opinion that acceptance tests should not overly concern themselves with a lot of error conditions. I came to that conclusion after writing a lot of Selenium tests that do test for a lot of error conditions. What I ended up with was a series of tests that took a long time to run, tended to be a bear to update when things changed, and gained a reputation on our team for being unreliable due to a lot of obscure interactions that I believe were the result of my distorting the intended Selenium contract.

Verily, things will slip through even if you have an excellent net of unit tests, integration tests, and acceptance tests. But here’s the deal, even with 100% code coverage things will slip through. Sorry, but it’s true. That one obnoxious customer is going to enter that one unicode character you didn’t test for and your whole app is probably going to blow up because your particular OS-specific version of your database recognizes that as an escape sequence due to an obscure bug in character processing. Welcome to software!

The question for you is how tightly can you close that net without making you and everyone else on your team want to pull their hair out? For my part, the new guidelines that I’m going to be suggesting for adoption to our team at work will probably look something like this:

Lift snippets, actors, etc get unit tested unless they’re basic snippets. Where “basic” means an ID has been pulled from a URL, I’ve retrieved something from the database, and rendered its contents on a page. That removes about 99% of the admin screens we have for just doing things like listing a bunch of objects with links for edit and delete.
JavaScript gets tested if it contains any business logic. If any JS is looking at data that originated from our database and evaluating what it “means” and acting on that, it needs unit testing at least. Perhaps integration testing in addition to / in place of the unit tests depending on the situation.
The app has acceptance tests that are defined that represent goals from the point of view of a user. These tests are ideally run in some sort of headless mode locally and for every pull request we make for speed, and run in our entire browser matrix before release. We’ve gotten into the habit of making them more unit-test like than any of us enjoy working with and we have to run the browser to run them today, which is a bear for different reasons.

The goal, as always, is for tests to be useful without being frustrating to maintain. And I think these guidelines will strike that balance for us.

So, hopefully you’ve found my ramblings on testing enlightening. I hope you’ll consider sharing this with your friends and letting me know what you think. I always love hearing good feedback on these posts.

Until next time, folks.

← Previous Post Next Post →