When we write interaction-based tests, using mocks or spies, we make it harder to change those interactions later (*). We make it harder to do some kinds of refactoring! But does it matter?

Not necessarily - there are ways to use those test doubles so that the benefits outweigh the cost. But when you use them to write tests for a bad design - for code that’s hard to test - you are doing it wrong.

And your tests will come back and bite you later.

(*) There is a similar but different problem that can occur with state based tests. Want me to write about that one too? Ping me on Twitter.

Testing for Interactions

First, let’s look at how interaction-based tests work, both with mocks and spies.

Here’s a test that checks whether the code under test interacted with some other code. It is written in Typescript and uses omnimock.

it('calls the open function when the open callback was called', () => {
    const openMock = mock<()=>{content: string, filePath: string, }>('openMock')
    when(openMock()).return({ content: 'the slides content', filePath: 'dummy', }).once()

    createEditor({ _open: instance(openMock), })

    sendAction('file.open')

    verify(openMock)
})

In the classification from Martin Fowler’s Mocks Aren’t Stubs, the test double openMock used here is a “Mock”: It has the verification logic baked into it. It was configured so that it expects to be called .once().

Here is a similar test written in Java with mockito:

@Test
void callsTheOpenFunctionWhenTheOpenCallbackWasCalled() {
	FileSystemOperations fsOperations = mock(FileSystemOperations.class);
	Editor editor = new Editor(fsOperations);

	editor.sendAction(EditorActions.OPEN_FILE);

	verify(fsOperations).openFile();
}

The test double used here - fsOperations - behaves more like a “Spy” in the classification above, even though I created it with the function mock(...): It does not have verification logic baked in, but it records how it was used, so the test can verify the use later.

Both tests check whether an interaction has happened: They send a a message to the code under test ('file.open' or EditorActions.OPEN_FILE) and then check whether the code under test calls some function on a well-defined interface.

I would not necessarily consider this use of mock objects bad practice. The Typescript version of this test is actually from marmota.app, an application I am currently working on.

I could theoretically rewrite this test as a state-based test, though:

@Test
void callsTheOpenFunctionWhenTheOpenCallbackWasCalled() {
	FileSystemOperations fsOperations = mock(FileSystemOperations.class);
	when(fsOperations.open()).thenReturn("Content of the opened file");
	Editor editor = new Editor(fsOperations);

	editor.sendAction(EditorActions.OPEN_FILE);

	assertThat(editor.getEditorContent()).isEqualTo("Content of the opened file")
}

Even though I still call the function mock(...), this test is not using mocks or spies. Here, fsOperations is used as a stub that always returns canned answers, and the test can then assert that the code under test is in a certain end state.

But then I would have to expose some internal state of the editor, getEditorContent(), which I did not want to do in this case.

Mocks make Code Harder to Change

The interaction-based tests above make some future changes to the code harder while keeping other changes easy. Let’s look at some possible future changes…

Renaming the functions: Suppose I want to rename open to something else. This can be done with an automated refactoring - easy.

Adding new locations: When I want to add cloud storage in addition to the file system operations, I can keep all the tests. I would only have to create a second implementation of open() and the other operations and supply them to the production code. There is some work to do, but this change is not very hard.

Changing who calls the functions: Suppose the code for Editor is getting too big and I want to move open() and the other operations to a different class or component. I would not only have to move around a lot of production code but also a lot of test code if i want to test both things in isolation. This change is harder but still doable. And even though I’ll have to change a lot of test code, the tests are still valuable - They will guide me to take all the necessary steps.

Changing an interaction completely: Suppose open() should not be called at all anymore - The code should now call two or three different functions with the same effect instead. This refactoring is made harder by using test doubles because I not only have to move test code around, I have to rewrite all tests. The names of the tests can still guide me, but the code in the tests has only little value.

Changing dependencies completely: The code under test became too complicated because it has too many dependencies. I want to introduce a new abstraction or move dependencies higher up in the hierarchy. This is basically a combination of the last two points. Since all my tests rely on which dependencies the Editor has and how Editor interacts with those, this change has become really hard.

Mocks and Legacy Code

When legacy code is hard to test automatically, it is usually because

  • It is hard to “see” what is happening: The test cannot check the end state of the code under test, becaust that code does not expose any state.
  • It is hard to set up the code under test, because it has too many, too complicated dependencies, each of which are hard to set up and hard to test.

Test doubles, especially mocks and spies, can help us bring this code under test (but this is a bad idea, see below):

  • Stub all the hard-to-set-up dependencies.
  • Verify all interactions with those dependencies, since we cannot assert the end state.

With little effort, we can reach a high test coverage that will even prevent us from regressions: Those test will prevent us from changes that change the behavior of the code. So there is some value in those tests.

But the code we just brought under test is badly designed. It has too many, too complicated dependencies. There is probably an abstraction missing and we should also probably split it into many smaller modules.

This would mean changing dependencies completely, but that was made really hard by the tests we just wrote.

The tests not only prevent us from creating regressions, they also prevent refactoring: They do now allow us to make changes to the code that do not change the behvaior!

To Recap…

Interaction-based tests with test doubles (stubs, mocks, spies) can make future refactoring harder or even next to impossible. Especially when used to bring bad code - code with too many, too complicated dependencies - under test, because those tests “lock in” the bad design.

But that does not mean that those techniques are bad per se. When you use them

  • when you already have a relatively clean design
  • to test for interactions that absolutely must happen
  • to test what happens on “both sides” of a well-defined, stable interface

they can allow you to write better code and tests that are easier to understand.

But before you write an interaction-based test, think about which future changes this test will make easier and which changes it will make harder or prevent.

See also: Value and Cost of Tests where I discuss this topic in a more general way.