One of the first things I learned in testing is to use ‘real’ test data. First, a story.
My first job as a tester is was verifying financial software; typically for reporting capital positions to central banks. So there I was, on-site testing one for a Caribbean Trust company (think Tax Haven for wealthy Canadians) and using numbers I could nicely (easily) calculate in my head or scrap piece of paper as input. Things like $10, $5, $7.50, etc. Of course, at some point the person in charge of the company saw me do this and freaked. And as it turned out, rightly so. To open an account there takes a large monetary commitment, so their capital positions were many, many more digits than I was using. He was appeased, though still skeptical, with numbers like $100000000.00, $50000000.00 and $75000000.50.
This is ‘real’ test data because it adheres to its internal rules. All data has rules. Even ‘free form text’ has rules. The trick is of course figuring out what they are.
The good thing about rules, is that once you know them, you can exploit them.
I’ve been around a new testers (or people who have been temporarily conscripted to be testers) enough to have notice that there are patterns when creating test data. Take a ‘name’ field for instance. A new tester will often use their name first, their spouse’s name next, then their kids, the rest of the family, characters from TV shows and ending up with movie characters then they get stuck. The trap they have stumbled into is that while they did create data that met the rules of the field but their thoughts were influenced by the field label (‘name’).
Let’s pretend these are the rules around the ‘name’ field:
- minimum 2 characters
- maximum 60 characters (that is the column size where it will e stuffed into in the database)
- spaces and hyphens are acceptable
- case is preserved in the data base
But doesn’t ‘dofdsiiIOIDFk’, ‘dsklfjewojf-k’ and ‘dsfjslkfjl sjflksjdfiew’ also satisfy those rules? Of course they do, they are just hard to pronounce. Guess what? The system doesn’t care. All it cares about is that it is getting something that passed a set of rules that describe it. Once you have this epiphany you can start to interesting test data generation.
I’m not sure whether this is the fuzzy line between model-based automation or dynamic data driven testing but the theory is that you have the script do all the thinking for you. Here is a python script which will create unique test data forever (or at least long enough that it might as well be forever).
import random, string
name = []
valid = string.letters + " -"
min = 2
max = 60
how_many = random.randint(2, 60)
for x in xrange(0, how_many):
name.append(random.choice(valid))
print "".join(name)
Here is a sample of what it generated
- sWZidlWaWQ
- EIMZpdFvYzhZINKQoByWWVxbqGXhhIU gp-FZR neMIgZfIaOsn
- cdVaKADxlDJxABlMCF GmpqmyvQThDCUnLjfWp
- LRDSYSroV
Great. So what?
Well, now you can use your brain for thinking up interesting scenarios to test, not test data. The data is often just a means to an end in most cases.
You could also make this a function in a module some place and have your automation rig call it for test data instead of some hard coded value? Now you’re really getting somewhere.
As a summary…
- All test data has rules
- Anything that meets those rules is ‘real’
- Rule identification can be hard
- Once you know them, you can exploit them