Experiments in public: Mocking snakes

Many years ago, I was tasked with improving the performance of a suite of unit tests. They were taking ever longer to run, and were beyond 20 minutes when I started working with them. Needless to say, this meant people rarely ran them.

From Miško Hevery:

What I want is for my IDE to run my tests every time I save the code. To do this my tests need to be fast, because my patience after hitting Cntl-S is about two seconds. Anything longer than that and I will get annoyed. If you start running your tests after every save from test zero you will automatically make sure that your test will never become slow, since as soon as your tests start to run slow you will be forced to refactor your tests to make them faster.

The problem was that every test was descended from a common base class, and that class brought up fake versions of most of the application. Well, mostly fake versions. There was still a lot of I/O and network activity involved in bringing up these fakes.

The solution turned out to be mock objects, JMock in this particular case. For those unfamiliar, mock objects are objects that can stand in for your dependencies, and can be programmed to respond in the particular manner necessary for whatever it is you are testing. So if your network client is supposed to return "oops" every time the network connection times out, you can use a mock to stand in for the network connection, rather than relying on lady fortune to drop the network for you (or doing something terrible, like having code in your test that disables your network interface).

There are a couple of general drawbacks to using mock objects, but the primary one is that a mock object only knows what you tell it. If the interface of your dependencies change, your mock object will not know this, and your tests will continue to pass. This is why it is key to have higher level tests, run less frequently, that exercise the actual interfaces between objects, not just the interfaces you have trained your mocks to have.

The other drawbacks have more to do with verbosity and code structure than anything else. In order for a mock to be useful, you need a way to tell your code under test what dependency it is standing in for. In my code, this tends to lead to far more verbose constructors, that detail every dependency of the object. But there are other mechanisms, which I will explore here.

For a more verbose comparison of mock libraries in a variety of use cases, check this out:

http://garybernhardt.github.com/python-mock-comparison/

Hopefully this post will be a more opinionated supplement to that.

There are a couple of categories of things to mock:

Unreliable dependencies (network, file system)
Inconsistent dependencies (time-dependent functionality)
Performance-impacting dependencies (pickling, hashing functions, perhaps)
Calls to the object under test

The last item is certainly not a necessity to mock, but it does come in handy when testing an object with a bunch of methods that call each other. I'll refer to it as "partial mocking" here.

For this article, I'm going to focus on 4 mock object libraries, Mocker, Flexmock, and Fudge, chosen primarily because they are the ones I have experience with. I also added in Mock, but I don't have much experience with it yet. I believe, from my more limited experience with other libraries, that these provide a decent representation of different approaches to mocking challenges.

I'm going to go through common use cases, how each library handles them, and my comments on that. One important note is that I generally don't (and won't here) differentiate between mocks, stubs, spies, etc.

Getting a mock

Gist: "Getting mock objects from different libraries"

Dependencies are usually injected in the constructor, in a form like the following:
Gist "Verbose dependency specification for mocking"

This is verbose, especially as we build real objects, which tend to have many dependencies, once you start to consider standard library modules as dependencies. :)

NOTE: Not all standard library modules need to be mocked out. Things like os.path.join or date formatting operations are entirely self contained, and shouldn't introduce significant performance penalties. As such, I tend not to mock them out. That does introduce the unfortunate situation where I will have a call to a mocked out os.path on one line, and call to the real os.path on the next:
Gist: "Confusion when not everything is mocked"

This can certainly be a bit confusing at times, but I don't yet have a better solution.

However, it is quite explicit, and avoids the need for a dependency injection framework. Not that there's anything wrong with using such a framework, but doing so steepens the learning curve for your code.

Verifying Expectations

One key aspect of using mock objects is ensuring that they are called in the ways you expect. Understanding how to use this functionality can make test driven development very straightforward, because by understanding how your object will need to work with it's dependencies, you can be sure that the interface you are implementing on those dependencies reflects the reality of how it will be used. For this and more, read Mock Roles Not Objects>, by Steve Freeman and Nat Pryce.

...anyway, verification takes different forms across libraries.
Gist: "Verifying mock expectations"

Partial Mocks

Partial mocking is a pretty useful way to ensure your methods are tested independently from each other, and while it is supported by all of the libraries tested here, some make it much easier to work with than others.
Gist "Partial mocks"

Chaining Attributes and Methods

I'm of the opinion that chained attributes are generally indicative of poor separation of concerns, so I don't place too much weight on how the different libraries handle them. That said, I've certainly had need of this functionality when dealing with a settings tree, where it can be much easier to just create a mock if you need to access settings.a.b.c.

Chained methods are sometimes useful (especially if you use SQLAlchemy), as long as they don't impair readability.
Gist "Chaining methods and attributes"

Failures

An important part of any testing tool is how informative it is when things break down. I'm talking about detail of error messages, tracability, etc. There's a couple of errors I can think of that are pretty common. For brevity, I'm only going to show the actual error message, not the entire traceback.

Note: Mock is a bit of an odd duck in these cases, because it lets you do literally anything with a mock. It does have assertions you can use afterwards for most cases, but if an unexpected call is made on your mock, you will not receive any errors. There's probably a way around this.

Arguments don't match expectations, such as when we call time.sleep(4) when our expectation was set up for 6 seconds:

Mocker: MatchError: [Mocker] Unexpected expression: m_time.sleep(4)
Flexmock: InvalidMethodSignature: sleep(4)
Fudge: AssertionError: fake:time.sleep(6) was called unexpectedly with args (4)
Mock: AssertionError: Expected call: sleep(6)
Actual call: sleep(4)

When I first encountered Flexmock's InvalidMethodSignature, it threw me off. I think it could certainly be expanded upon. Otherwise, Mock and Fudge have very nice messages, and as long as you know what was supposed to happen, Mockers is perfectly sufficient.

Unexpected method called, such as when you misspell "sleep":

Mocker: MatchError: [Mocker] Unexpected expression: m_time.sloop
Flexmock: AttributeError: 'Mock' object has no attribute 'sloop'
Fudge (patched time.sleep): AttributeError: 'module' object has no attribute 'sloop'
Fudge: AttributeError: fake:unnamed object does not allow call or attribute 'sloop' (maybe you want Fake.is_a_stub() ?)
Mock: AssertionError: Expected call: sleep(6)
Not called

Mock doesn't tell you that an unexpected method was called. Mocker has what I consider the best implementation here, because it names the mock the call was made on. The second Fudge variant is good, but because you might encounter it or the first variant depending on context, Fudge overall is my least favourite for this. Flexmock simply defers handling this to Python.

Expected method not called:

Mocker: AssertionError: [Mocker] Unmet expectations:
=> m_time.sleep(6)
- Performed fewer times than expected.
Flexmock: MethodNotCalled: sleep(6) expected to be called 1 times, called 0 times
Fudge: AssertionError: fake:time.sleep(6) was not called
Mock: AssertionError: Expected call: sleep(6)
Not called

I think they all do pretty well for this case, which is good, because it's probably the most fundamental.

Roundup

So, having spent a bit of time with all of these libraries, how do I feel about them? Let's bullet point it!

Mocker

Pros

Very explicit syntax
Verbose error messages
Very flexible

Cons

Doesn't support Python 3 and not under active development
Performance sometimes isn't very good, especially with patch()
Quite verbose

Flexmock

Pros

Clean, readable syntax for most operations

Cons

Syntax for chained methods can be very complex
Error messages could be improved

Fudge

Pros

Using @patch is really nice, syntactically
Examples showing web app testing is nice touch

Cons

@patch can interfere with test runner operations (because it affects the entire interpreter?)
Partial mocking is difficult

Mock (preliminary)

Pros

Very flexible

Cons

Almost too flexible. All-accepting mocks make it easy to think you have better coverage then you do (so use coverage.py!)

Acknowledgements

Clearly, a lot of work has been put into these mock libraries and others. So I would like extend some thanks:

Gustavo Niemeyer, for his work on Mocker.
Kumar MacMillan, for his work on Fudge, and for helping me in preparing material for this post.
Herman Sheremetyev, for his work on Flexmock
Michael Foord, for his work on Mock, and for getting me on Planet Python

Additionally, while I didn't work with them for this post, there are a number of other mock libraries worth looking at:

Dingus
Mox
MiniMock, which I've used quite a bit in the past, and I'm delighted to learn that development is continuing on it!

4 comments:

Alec MunroSeptember 8, 2011 at 11:36 AM
Well, it's finally up. This ended up being a much bigger undertaking then I had expected, but it still feels incomplete.

I'm very interested in any feedback, to either expand this post, or perhaps create a follow-up.
Alec MunroSeptember 11, 2011 at 6:29 PM
Thanks to a quick chat with Michael Foord, I realized that my link, and some of my understanding, for Mock, was pointing to a different project.

I've updated the link, and removed some of the inaccurate commentary, but the new documentation he pointed me to has quite a lot to digest, so I'll probably have to update this again in a couple of days with further conclusions.
Ross ReedstromSeptember 12, 2011 at 8:42 AM
tldr; ;-) Yet, but I will! Thanks for putting this up: I've been planning to get around to investigating mocking in python for some time - your post looks like a good general overview and jumping off point.
Alec MunroSeptember 13, 2011 at 6:11 AM
As it turns out, FlexMock does indeed verify expectations automatically, as long as it can integrate with your test runner. Unfortunately, I've been using PyDev for development, and it uses Exceptions for flow control at a certain point, which ends up leaving an exception in sys.exc_info, which FlexMock interprets as a reason not to do the verifying.
I've filed a bug here:
http://sourceforge.net/tracker/?func=detail&aid=3408057&group_id=85796&atid=577329
and hopefully I'll be able to submit a patch shortly.

Thursday, September 8, 2011

Mocking snakes