Arminas Katilius Personal Blog

Object creation techniques for tests

October 28, 2019

In past post I have mentioned few ideas on how to create objects for tests. Here I want to go through some new ideas with more details about them. Code snippets are in Typescript, but techniques can be applied to any language.

Important thing first

Before applying any of the techniques mentioned here, rethink the class structure and how objects are created in production code. Famous saying that tests help you create better-designed code applies here as well. And quite often if it’s inconvenient to create objects in test code, it’s hard to do the same in production code as well. After refactoring code, you might need no additional techniques. Common refactorings are: Simplifying object structure, adding meaningful constructors, adding static factory methods, adding Object Builders.

Test Data creator for integration tests

If you find yourself creating similar groups of objects in multiple tests, you might find this technique useful. I usually avoid initial or seed data for tests and create new data for each test. This makes a large set of tests easier to maintain as tests become more predictable and more isolated. But it comes with a cost. It becomes harder to prepare setup data, because you need to create the same objects over and over for each test case.

For example, you have a banking system, and many tests use an account object. But to insert Account object into database, you need to create both Person and Manager objects and persist them in the database. In an application using repositories for persistence, this would mean that you need to inject at least 3 repositories into your test just for data creation. And it can get worse with more types.

To solve that, you can group similar functionality and create TestDataCreator class, that would look something like that:

class BankTestDataCreator {
    personRepository: PersonRepository;
    managerRepository: ManagerRepository;
    accountRepository: AccountRepository;
    
    createEmptyAccount() {
       // Create and insert person
       // Create and insert manager
       // Create account, assing manager and person and insert
       return createdAccount;
    }
}

This builder would span across multiple domain classes and would create and insert objects into the database. If an application is relatively small, a single test data creator might be enough. And for a larger application, you should split them by similar functionality. Often I use test data creators together with fake data generation libraries, that I will mention another section.

Test data creator that returns builder instance

You might have a nice builder for your object, but you keeping defining groups of fields with similar meaning. For example, you have a car renting system and you keep creating a high-end car object for your tests:

Car.builder()
   .withPrice(300_000)
   .withManufacturer("Lamborghini")
   .withModel("Gallardo")
   .withYear(2019)
   .build();

This duplication is easily solved with utility function createLamborghiniGallardo and it might be fine for most of the cases. But then in some tests, you might need to modify some of the values. If your object is immutable (if you are using a builder pattern it probably is), you can’t do that. And fix for that is easy, instead of returning created object, you can return an instance of builder, that would allow doing modifications:

    carCreator.createLamborghiniGallardo().withYear(2017).build()

This is a powerful technique, but it should be used carefully. Having too many meaningful values hidden in test data creators can make test code harder to understand. It should always be clear what kind of data is created and how it is used when asserting if code works correctly.

Allow modifying data before inserting/creating

This technique is especially useful when using with test data creator, that inserts object into database. You have createAccount method that creates generic Account object, but for most of the tests generic one is not enough. You need to change account name, or edit contact info or etc. One solution could be splitting Account creation and insertion logic. But that means if Account database object depends on other 3 objects, you would need to inject 3 repos into your test. Simpler solution is keep data creator as is, but add overloaded method which accepts modifying function:

function createContract(changeBeforeSave: (contract: Contract) => void) {
    const contract = // ... create randomized values for contract object
    changeBeforeSave(contract);
    return contractRepo.save(contract);
}

Then in the test:

it('keeps contract name the same', () => {
    const contract = testContractCreator.createContract(c => {
        c.setName("Test 234");
    })
    // ..remaining test code
})

Use fake data generators

I once was skeptical about using random data in tests, because they can create brittle and flaky tests. But applied correctly it can reduce the amount of code and even catch bugs that might be harder to catch with only using hardcoded data. It’s only important that assertion failures are clear and it’s easy to find out with what kind of data test failed.

One concrete example where randomized data helps is testing object mappers. It can be used, when you are in a situation where you have 2 formats of the same data that have to be mapped back and forth. Using a test generator you could write the following test:

describe('Mapper', () => {
   it('maps between A and B formats without loosing data', () => {
     const formatA = createRandomObjectInFormatA();
     const formatB = mapper.mapToFormatB(formatA);
     const mappedFormatA = mapper.mapToFormatA(formatB);
       
     expect(formatA).toEqual(mappedFormatA);  
   }) 
});

Of course, you have to have test cases with static data that describe certain cases of mapping logic. But generated one part helps to find unexpected bugs. It becomes a bit less predictable, but as long as test failures have enough details (they have to clearly show what data was given and what was expected), it is easy to replicate

Also, an important thing to mention is that data does not have to be random, multiple fake data libraries allow to have meaningful restrictions on data: generating a number from a range, generating the date in the past, generating a meaningful email, etc. Another application is for simply creating test data objects. Few good examples: