Experiments in public: September 2011

Monday, September 26, 2011

Python Web Client 0.1.dev1

tl;dr
I built a preliminary version of the Socal Piggies Python Web Client. Take a look at the code and methodology, suggest features/fixes, and otherwise enjoy! There's no live demo because of a gaping security hole.

Intro
First off, I would just like to express my sincere appreciation for everyone who responded to my request for ideas. There were a number of interesting options, but for now, I've decided to build an implementation of the Python Web Client described on the Socal Piggies site. I made this choice because it's a comfortable area for me to work in, it's a tool I can see using, and probably most important, I think I can build a simple version fairly quickly. :)

Design
The first thing I usually do with a new project is retreat to a quiet corner with a notebook and a writing implement. It would be nice to find an electronic way to do this, but so far nothing has come close to what I need in terms of offering a combination or structured and free-form input, along with instant availability.

In this case, I was trying to strip the concept down to the bare essentials. In this case that means:

One page, with two widgets:

Create request (Enter a URL)
Display response (Status Code + Headers)

Startup script that launches a browser to the service

That last one may seem non-essential, but speaks to my philosophy that "delivery is as important as development". In practice, that means that how a client is introduced to functionality is just as important as how well that functionality works. Make it really easy to start using.

Because this is somewhat of a showcase project, there's a couple of other things I pinned on my design list:

Testing (unit, functional, system, jsunit)
Docs (UI + API, published in Sphinx)

While I was putting this together, I also wrote down a whole lot of nice-to-have features for later. You can check them out on the Rally site I am (kind of) using to manage this project. You will need to log in:
alecmunro+public@gmail.com:Experiments

Testing
Because I'm a TDD advocate, let's do that. So first I need to decide what tests I want to have. Since I don't know my code structure at all yet, I'm going to start with system level tests, which I usually define as something that tests the system at the UI level, running very close to how it will run in production:

Visit site, enter URL, press submit, verify results.

Gist-It for test_ui.py

That pretty much does it, and can be done with something like Selenium, unless I also wanted to test the startup script. Doing so would involve using something like Sikuli(which I do love), to observe the state of the desktop, but that might balloon the scope of this project a bit too much. So we are on to functional testing, in this case defined as testing the API of the web service, in as isolated an environment as we can create. So what are we looking at there?

Submit URL, verify response

Probably some variants of this, to test error handling or redirect responses (we are handling those, right?).
What happens if the URL to retrieve is the URL of the webservice itself? Could we experience some nastiness there?

So both of the test types we have addressed so far require an actual connection to another server. We could use something that's always going to be available, like www.google.com, but we aren't really guaranteed a network connection. So for this, I'll write a small web server that can be set to return whatever you want it to.
Gist-It for echo_server.py

This was actually a bit trickier than I anticipated, due to the need to run the server in a separate thread/process, and this bug. Anyway, here's the API tests:

Gist-It for test_apis.py

Ok, so unit tests now. From our earlier tests, it's become pretty clear that the API will have one view, which accepts the details to construct a request (just a URL to start), submits that request, and returns the status code and headers from the response.

Well, that was all really boring. Maybe the jsunit tests will be more interesting? In practice, I probably won't write these before the code, because it still takes me a while to get into the rhythm of writing tests for javascript. I need a bit of trial-and-error.

Create Request Widget:

Enter text and press submit. A call should be made to create the request, and the deferred for that call should be passed to the page.

Gist-It for test_create_request.js

Display Response Widget:

Supply it with various responses, and confirm that they display properly. Probably the most interesting bit of testing of the whole lot.

Gist-It for test_display_response.js

Implementation

Ok, so we have our tests, perhaps. Now, on to the implementation. This part is actually really simple.

There's the Python view:
Gist-It for views.py

and the two Javascript widgets:
Gist-It for create_request.js

Gist-It for display_response.js

Feel free to take a look at the GitHub repository for more details (or check it out to run it).

Documentation
I've left off the documentation for now, both because I wanted to get this up soon, and it's been a while since I started anything with Sphinx, and also because it really doesn't do much yet. I'm also still missing the launch script.

Conclusion
There's lots more work to be done here to make something useful, so I'm taking suggestions. But hopefully this gives you an idea of how you can build a simple and somewhat tested web-app using Pyramid and JQuery. You will notice that it is very testing heavy, probably significantly more than real-world deadlines would allow for. But once you get these tests in place (and a system to run them), they are fairly easy to build on, and can provide a safe container in which to experiment.
For the moment, I'm planning two distinct iterations:

Add docs and launch script, as well as displaying the response body. That will be 0.1
Flesh out the request creation ability, to allow settings headers and parameters. Along with hopefully some fixes/refinements, that will be 0.2

Beyond that, development will depend on whether this is interesting to anyone, so let me know.

Saturday, September 17, 2011

Sorting out

As someone who almost exclusively uses high-level languages, the idea of writing my own sorting algorithm seems a bit silly. However, I can certainly appreciate the algorithms used for sorting, and it's a neat opportunity to try to document and implement these algorithms using methods I am familiar with.

Preface

These implementations are likely to be extremely naive. If you went CS school, you probably learned better implementations in your first year. There is absolutely no reason to use these in production code anywhere. The built-in python sorting is 10-50 times faster than anything I've produced (I'm actually surprised it's not more). If you've got tips about how to speed these up just by changing the code around, feel free to comment, or you can try forking my repository and sending a pull? request. I can't promise to spend much time on it, because this is pretty well-trod territory, but it would be a bit entertaining to see how close we can get.

Quick Sort

My understanding of quicksort is still developing, but based on what I've read on Wikipedia, it can work something like this:

Grab an item in the list, called the "pivot".
Create two lists, representing everything greater than the pivot, and everything less then the pivot.
Repeat steps 1 and 2 on the two sub-lists, returning the a list comprised of the smaller list items, then the pivot, then the larger list items.

In this way, the entire list should get sorted. It's my understanding there are various optimizations that can be done, such as swapping elements instead of creating new lists. It seems to be one of the "best" all around algorithms, though for certain cases, it's performance can be beaten by others.

Gist-it for quick_sort.py

This was the second fastest list implementation I created, at about 1/12th the performance of the built-in sorting.

Merge Sort
Merge sort relies on splitting the list recursively until you have an ordered hierarchy of l item item lists. Then you compare each list with it's neighbour, and return a combined list of the two merged together. Now you have 2-item sorted lists, which you work through, comparing the left-most element to its neighbour's left-most element, and moving the lowest one to a new list, eventually creating a 4-item list.

Gist-it for merge_sort.py

Heap Sort

Heap sort relies on understanding what a "heap" is, in this context. So let's do that.

A heap is a data structure that is tree based, and follows the rule that all children of a node will have a value that is lower than or equal to the value of that node (for a max-heap, for a min-heap the rule is reversed). In practice, heaps are usually implemented within arrays, with the first or last item of the array being the root of the tree.

Now, why do we do have a heap? What's the advantage? Apparently sorting things. :)

In a heap sort, we first build the heap from the list we've been given. Once we have our heap, we know the item with the highest value is at the root of the heap, which is position 0 in the list representing the heap. So now we swap the first and last elements in that list, and either move the last element from the list, or simply treat that part of the heap as already sorted (which means our heap needs to know to ignore it somehow). Then we take the first item, and trickle it down through the heap until the heap is proper again. This consists of comparing it's value with it's children, and swapping it with the higher one of these, until one of these is not higher.

Gist-it for heap_sort.py

Python apparently does have a built-in heap library, which might be worth benchmarking, as heap was my slowest sort implementation.

Bubble Sort

Bubble sort is fairly trivial.

Iterate through the items, except the last one, if an item is greater than the next one, swap their positions. When you reach the end of the list, do it again, unless there were zero swaps.

Gist-it for bubble_sort.py

Insertion Sort

Insertion sort is another trivial sort. I'm really only including it because I think my (unwitting) choice of an insertion sort in an interview may have cost me a job.

Gist-it for insertion_sort.py

Interestingly enough, for my stupid example of sorting a small list of characters, insertion sort is the fastest implementation I created.

Acknowledgements and Conclusion

Not really much to conclude, except that I now have a better understanding how to write a sorting algorithm, and some of the pros and cons of different algorithms. I certainly can't look at a data set and give you the big O case for a given algorithm.

But one thing I do want to draw special attention to is Wikipedia's pages on this subject. They vary a bit in information density, but they were incredibly informative, and make use of numerous methods to help communicate how each sort works. If you really want to understand this stuff, I would definitely start with these pages:

Also, my thought to do this post was sparked by another post on Planet Python that mentioned Insertion Sort, where I recognized the algorithm. I'm not entirely sure, but this may be that post.

Thursday, September 15, 2011

Let me build you a web app/service!

So, according to my introduction, one of my goals for this project is to build a web app or service to demonstrate my skillset in that area. Unfortunately, it's turns out my idea for a web app has already been done, and is quite well-known (on the plus side, I now understand what 4Chan is). At the moment, I'm drawing blanks about what I should build, so I thought perhaps I could crowdsource that job.

So if you have an idea for a web app or service, but not the time or skills to get it started, or if you would simply like to collaborate with someone on it, post a comment describing your idea.

In general, I'm game for pretty much anything, as I'm confident I can learn whatever I need to. Some general parameters though:

It's going to be in Python, built on Pyramid, using SQLAlchemy for models
I will try to strip the idea to it's core, and leave others to make it full-featured
I will try to make an appealing and usable UI, but again, it will be pretty barebones
I'm going to blog about it, host it in github and make the code viewable by everyone
If you want to collaborate:

I'm open to hosting it elsewhere, as long as it's publicly accessible
I will be a stickler in design discussion and code reviews

Finally, I will happily put it under any license you want and grant you and copyright or whatnot, as long as it doesn't prevent me from achieving my objectives vis-a-vis skill demonstration.

Tuesday, September 13, 2011

Design patterns

Design patterns are a very broad subject, but as far as I can tell, they are primarily common conventions for structuring software. I've never been taught them, and as far as I can tell, I've used nearly all of them, repeatedly. So for an experienced software developer, I don't believe studying design patterns is likely to improve your coding skills much, if at all. However, it could certainly improve your collaboration skills.

Words Matter
If there's one downside to being self-taught in the field of computer programming (there are several), it would be the difficulty of communicating your ideas with those who have a "standard" education. However good my code may be doesn't matter if I spend all my time trying to explain how I'm going to implement something. Formal computer science comes with a whole boatload of terminology, design patterns among them. Being able to say "use the template method pattern to defer processing to subclasses" is much more concise than "in the superclass method, call methods that are overridden in the subclass, but have only stub implementations in the superclass". Even that verbose description is concise compared to what usually comes up with discussing actual project code.

Template Method
The template method pattern is something that eventually comes in handy to anyone who does a fair amount of object oriented design. It's a response to the problem of how to mix specialized behaviour of subclasses with general behaviour of superclasses. The basic idea is that the superclass implementation of a method will contain mostly calls to other methods of the object, which will be mostly abstract in the superclass, and implemented in the subclasses, which won't implement the method being called.
I recently used this on an email reporting system, which used either HTML or text email depending on the job being reported on. There was a fair amount of commonality between both types, in that they dealt with collecting data from the environment and stuffing it into some kind of template (the use of the word template here is unrelated), then formatting that template into an email and sending it along.
The actual email sending was almost the same for both HTML and text, so with just a condition or two we could handle that entirely in the superclass. However, the data that was sent out varied a bit between the two types, as did the emails themselves.

Gist: "Template Method example"

Note that the implementation above is not ideal, because the subclasses do need a certain awareness of the superclass implementation in order to behave properly.

Mediator

The mediator pattern, in my understanding, mostly deals with managing the flow of communication between objects. It puts a layer between your object and it's intended communication target. I use this pattern frequently for building UIs, most of which I build in JQuery, so that's where I use it.

The way I build UIs in JQuery has been a constantly evolving pattern, so this may change, but right now it gives me the flexibility I need. What I usually do is create an object representing the page itself, as well as objects to represent each of the distinct interface areas, which I am going to call widgets (thought that might be too specific of a term in this context). Each of the widgets takes zero or more callback functions, depending on what they do, and the page object is responsible for seeing that these point to the same things. In some cases, they are methods of other widgets (though in typical javascript fashion, they are anonymous functions that call those methods), but more often they are methods of the page object, which accept the communication and see it dispatched to any other widget(s) that are interested. The page object is almost never more than wiring between the widgets.

The following is probably not syntactically correct. It frequently takes a bit of trial and error before I get into the swing of JavaScript OOP.

Gist: "Mediator Pattern example"

Strategy

The strategy pattern deals with abstracting away algorithms. In situations where a different context might require a different algorithm to achieve the desired results, you apply the strategy pattern. This might sounds complex, but in practice, it is as simple as using different validation functions on different data types, as long as you have a common interface to those functions.

Gist: "Strategy Pattern example"

Adapter

The adapter pattern is one that, thanks to Zope 3, I've been familiar with for a long time. An adapter is an object that "wraps" another object to provide a desired interface. Designing a system based on the adapter pattern requires using capital-I Interfaces for establishing object communication. Then, as necessary, adapters are created to allow the available objects to satisfy those interfaces.
To a certain degree, the adapter pattern takes the concept of polymorphism, traditionally understood as being able to use a superclass' interface to interact with instances of subclasses, and expands it. So you can use the specification of an Interface to interact with any objects that are adaptable to that Interface. To use the traditional Bicycle example, Bicycle might extend Vehicle, and have traditional vehicular methods. However, maybe we want to check what color the bike is, or paint it. Vehicle may not have methods for this, but IPaintable might. Then, with a BicyclePainter adapter, we can adapter our Bicycle instance to this interface. (For the record, I've never found the Bicycle example very useful, and I'm not sure if I'm doing you any favours by using it here).

Gist: "Adapter Pattern example"

Conclusion

So there's some design patterns, and their usage. It's by no means comprehensive, and I'm sure many of you will have encountered these in other contexts where my definitions aren't entirely accurate. That's just fine, because the value of these is not in having rigid development patterns, it's having useful ways to elevate the discussion of your development. I've almost never encountered clearly recognizable design patterns in code I inherited, because patterns are theory, and when applied, the real world makes things start to get messy.

References

Wikipedia: I relied on this for design patterns and their basic descriptions.
"Adapters" chapter from Grok tutorial

Thursday, September 8, 2011

Mocking snakes

Many years ago, I was tasked with improving the performance of a suite of unit tests. They were taking ever longer to run, and were beyond 20 minutes when I started working with them. Needless to say, this meant people rarely ran them.

From Miško Hevery:

What I want is for my IDE to run my tests every time I save the code. To do this my tests need to be fast, because my patience after hitting Cntl-S is about two seconds. Anything longer than that and I will get annoyed. If you start running your tests after every save from test zero you will automatically make sure that your test will never become slow, since as soon as your tests start to run slow you will be forced to refactor your tests to make them faster.

The problem was that every test was descended from a common base class, and that class brought up fake versions of most of the application. Well, mostly fake versions. There was still a lot of I/O and network activity involved in bringing up these fakes.

The solution turned out to be mock objects, JMock in this particular case. For those unfamiliar, mock objects are objects that can stand in for your dependencies, and can be programmed to respond in the particular manner necessary for whatever it is you are testing. So if your network client is supposed to return "oops" every time the network connection times out, you can use a mock to stand in for the network connection, rather than relying on lady fortune to drop the network for you (or doing something terrible, like having code in your test that disables your network interface).

There are a couple of general drawbacks to using mock objects, but the primary one is that a mock object only knows what you tell it. If the interface of your dependencies change, your mock object will not know this, and your tests will continue to pass. This is why it is key to have higher level tests, run less frequently, that exercise the actual interfaces between objects, not just the interfaces you have trained your mocks to have.

The other drawbacks have more to do with verbosity and code structure than anything else. In order for a mock to be useful, you need a way to tell your code under test what dependency it is standing in for. In my code, this tends to lead to far more verbose constructors, that detail every dependency of the object. But there are other mechanisms, which I will explore here.

For a more verbose comparison of mock libraries in a variety of use cases, check this out:

http://garybernhardt.github.com/python-mock-comparison/

Hopefully this post will be a more opinionated supplement to that.

There are a couple of categories of things to mock:

Unreliable dependencies (network, file system)
Inconsistent dependencies (time-dependent functionality)
Performance-impacting dependencies (pickling, hashing functions, perhaps)
Calls to the object under test

The last item is certainly not a necessity to mock, but it does come in handy when testing an object with a bunch of methods that call each other. I'll refer to it as "partial mocking" here.

For this article, I'm going to focus on 4 mock object libraries, Mocker, Flexmock, and Fudge, chosen primarily because they are the ones I have experience with. I also added in Mock, but I don't have much experience with it yet. I believe, from my more limited experience with other libraries, that these provide a decent representation of different approaches to mocking challenges.

I'm going to go through common use cases, how each library handles them, and my comments on that. One important note is that I generally don't (and won't here) differentiate between mocks, stubs, spies, etc.

Getting a mock

Gist: "Getting mock objects from different libraries"

Dependencies are usually injected in the constructor, in a form like the following:
Gist "Verbose dependency specification for mocking"

This is verbose, especially as we build real objects, which tend to have many dependencies, once you start to consider standard library modules as dependencies. :)

NOTE: Not all standard library modules need to be mocked out. Things like os.path.join or date formatting operations are entirely self contained, and shouldn't introduce significant performance penalties. As such, I tend not to mock them out. That does introduce the unfortunate situation where I will have a call to a mocked out os.path on one line, and call to the real os.path on the next:
Gist: "Confusion when not everything is mocked"

This can certainly be a bit confusing at times, but I don't yet have a better solution.

However, it is quite explicit, and avoids the need for a dependency injection framework. Not that there's anything wrong with using such a framework, but doing so steepens the learning curve for your code.

Verifying Expectations

One key aspect of using mock objects is ensuring that they are called in the ways you expect. Understanding how to use this functionality can make test driven development very straightforward, because by understanding how your object will need to work with it's dependencies, you can be sure that the interface you are implementing on those dependencies reflects the reality of how it will be used. For this and more, read Mock Roles Not Objects>, by Steve Freeman and Nat Pryce.

...anyway, verification takes different forms across libraries.
Gist: "Verifying mock expectations"

Partial Mocks

Partial mocking is a pretty useful way to ensure your methods are tested independently from each other, and while it is supported by all of the libraries tested here, some make it much easier to work with than others.
Gist "Partial mocks"

Chaining Attributes and Methods

I'm of the opinion that chained attributes are generally indicative of poor separation of concerns, so I don't place too much weight on how the different libraries handle them. That said, I've certainly had need of this functionality when dealing with a settings tree, where it can be much easier to just create a mock if you need to access settings.a.b.c.

Chained methods are sometimes useful (especially if you use SQLAlchemy), as long as they don't impair readability.
Gist "Chaining methods and attributes"

Failures

An important part of any testing tool is how informative it is when things break down. I'm talking about detail of error messages, tracability, etc. There's a couple of errors I can think of that are pretty common. For brevity, I'm only going to show the actual error message, not the entire traceback.

Note: Mock is a bit of an odd duck in these cases, because it lets you do literally anything with a mock. It does have assertions you can use afterwards for most cases, but if an unexpected call is made on your mock, you will not receive any errors. There's probably a way around this.

Arguments don't match expectations, such as when we call time.sleep(4) when our expectation was set up for 6 seconds:

Mocker: MatchError: [Mocker] Unexpected expression: m_time.sleep(4)
Flexmock: InvalidMethodSignature: sleep(4)
Fudge: AssertionError: fake:time.sleep(6) was called unexpectedly with args (4)
Mock: AssertionError: Expected call: sleep(6)
Actual call: sleep(4)

When I first encountered Flexmock's InvalidMethodSignature, it threw me off. I think it could certainly be expanded upon. Otherwise, Mock and Fudge have very nice messages, and as long as you know what was supposed to happen, Mockers is perfectly sufficient.

Unexpected method called, such as when you misspell "sleep":

Mocker: MatchError: [Mocker] Unexpected expression: m_time.sloop
Flexmock: AttributeError: 'Mock' object has no attribute 'sloop'
Fudge (patched time.sleep): AttributeError: 'module' object has no attribute 'sloop'
Fudge: AttributeError: fake:unnamed object does not allow call or attribute 'sloop' (maybe you want Fake.is_a_stub() ?)
Mock: AssertionError: Expected call: sleep(6)
Not called

Mock doesn't tell you that an unexpected method was called. Mocker has what I consider the best implementation here, because it names the mock the call was made on. The second Fudge variant is good, but because you might encounter it or the first variant depending on context, Fudge overall is my least favourite for this. Flexmock simply defers handling this to Python.

Expected method not called:

Mocker: AssertionError: [Mocker] Unmet expectations:
=> m_time.sleep(6)
- Performed fewer times than expected.
Flexmock: MethodNotCalled: sleep(6) expected to be called 1 times, called 0 times
Fudge: AssertionError: fake:time.sleep(6) was not called
Mock: AssertionError: Expected call: sleep(6)
Not called

I think they all do pretty well for this case, which is good, because it's probably the most fundamental.

Roundup

So, having spent a bit of time with all of these libraries, how do I feel about them? Let's bullet point it!

Mocker

Pros

Very explicit syntax
Verbose error messages
Very flexible

Cons

Doesn't support Python 3 and not under active development
Performance sometimes isn't very good, especially with patch()
Quite verbose

Flexmock

Pros

Clean, readable syntax for most operations

Cons

Syntax for chained methods can be very complex
Error messages could be improved

Fudge

Pros

Using @patch is really nice, syntactically
Examples showing web app testing is nice touch

Cons

@patch can interfere with test runner operations (because it affects the entire interpreter?)
Partial mocking is difficult

Mock (preliminary)

Pros

Very flexible

Cons

Almost too flexible. All-accepting mocks make it easy to think you have better coverage then you do (so use coverage.py!)

Acknowledgements

Clearly, a lot of work has been put into these mock libraries and others. So I would like extend some thanks:

Gustavo Niemeyer, for his work on Mocker.
Kumar MacMillan, for his work on Fudge, and for helping me in preparing material for this post.
Herman Sheremetyev, for his work on Flexmock
Michael Foord, for his work on Mock, and for getting me on Planet Python

Additionally, while I didn't work with them for this post, there are a number of other mock libraries worth looking at:

Dingus
Mox
MiniMock, which I've used quite a bit in the past, and I'm delighted to learn that development is continuing on it!

Wednesday, September 7, 2011

Blogging and not blogging

It's been a month since I started working on this blog, and the only thing I have gotten up is my introductory post, which talks about all the other stuff I want to write. Not quite the start I wanted to get off to, admittedly, but August managed to be much busier than I would have thought, me lacking a job and all.

That said, I do have a bone to pick with Blogger. It's convenient, free, interfaces with all my stuff, but writing things that include code has been a real pain in the butt. I could inline it as a quote, but I don't get syntax highlighting. It seems I can modify my template to include plugins for syntax highlighting, but given that I would like these bits to be copy-pastable, including the actual code in the HTML seems like a recipe for trouble.

So I am currently using "gist" for my samples, and embedding them with script tags. This is nifty, certainly, but I'm guessing it doesn't come across well in RSS, and probably to aggregators like Planet Python (which is probably where you are reading this). Furthermore, if I'm already maintaining a github account to get the gists, and for the larger project work, it seems like it would be much more straightforward if I could embed snippets directly from github, so I could do my editing and testing in PyDev.

I'm sure I'm not the first person to run up against this, so we will see if my Planet Python syndication can bring any knowledgeable commentators my way. If you have any suggestions, especially ones where you can link to examples, that would be very much appreciated.