Given my experience with creating automated regression tests, one would think that I’d remember the dangers of improperly planned data. Unfortunately, I am susceptible to getting wrapped up in how the function works and getting to an initial success and sometimes forget to see if the feature works differently if the data varies. While I take comfort in not being alone in making this mistake, I still regret not seeing it coming sooner.
With any automation, developers should always prepare a plan for providing reliable and predictable data. Without that data, the automation will become “flaky” and produce undependable results. As an example, I was recently running a performance test that we’ve been working with for a few weeks and has been running smoothly. We decided it was time to start ramping up the load and expanded the product data available. Unfortunately, we did realize that the new products had a setting that caused them to trigger a different process flow. The result was that our tests began failing and because the data was being selected at random, we couldn’t determine if we had hit a product limitation or an issue in the test. We lost several hours to debugging this situation.
An important lesson I have learned through this and similar experiences is that automation requires a data strategy that takes into account how the tests will be run and what their purpose is. In the case mentioned above, the solution is to generate a controlled set of products that will result in a predictable path through the software. We will also need to regulate any changes to the data load or develop an automated system to produce the desired entries in the required state before the test runs. It may sound obvious but it really is more complicated than you might think.
At least now I know what one of my future articles will be about, developing a data strategy.