Two years ago, I ended up creating a functional test codebase that took at its peak four hours to run. Considering we had over 1500 tests, this seems reasonable, but the tests were fragile in several ways.
- Since we were relying on data, tests could become fragile or have race conditions.
- Database reads reflect present values, not /correct/ ones.
- We spent a lot of time for data to become available after its insertion.
- Tests constantly mutated the system's state.
When things went wrong, what actually went wrong? Was it the insertion of data? Was the intent of the test screwed up somehow? Was I even testing the right page or area?
It turns out the best way to test a UI is to be sure of what you're testing, write it down in simple and plain terms and test only what you intend to test without affecting other parts of the system.
Enter Cucumber. Before working with Cucumber, I never kept a proper hierarchy of tests, steps, and underlying fixtures like I do now. Cucumber in some ways enforces this relationship, where ad hoc systems do not.
Cucumber has a grammar for its tests that makes it regular and easy to write:
Scenario: Right arrow advances one image
When clicking the right arrow on the gallery image
Then slide 1 is displayed
Scenario, When, and Then are part of selenium's DSL that mark lookups in the step definition files. Step definition files in Cucumber are plain Ruby files. Each statement is put together with a bit of magic that defines a regex to match against the list of steps:
When /^clicking the (left|right) arrow on the gallery image$/ do |side|
This clearly states the intent of the keyword in code. The implementation of GalleryWidget in this case is irrelevant to the tester, but I'll include it for completeness (greatly simplified):
class GalleryWidget < BasePage
link('right_arrow', :css, 'a.right.carousel-control')
In our system, link creates an interactive page object and stores it in a class method for later use.
We do more complicated stuff in the step files, but that's not the point of this post. The point is to have a direct 1:1 matchup between our test steps and the underlying objects while keeping the tests themselves reasonably and appropriately verbose.
Since all assumptions are written out in plain English, we don't have to worry about the intent of the test cases being lost. Since code matches the step definitions, we similarly don't have to worry about the underlying tests mis-interpreting the intent. If they do, it comes out very quickly when tests are failing, instead of being buried far beneath the surface.