Perfectly-timed optimization

I was going to write about specific vectorization tricks in Numpy and Matlab, but this is what I ended up with instead.  Stay tuned for the actual vectorization stuff.

Neuroscientists, and possibly biologists more generally, seem to take great pride in producing slow code.  They may even quote you Knuth’s famous line:

…premature optimization is the root of all evil.    -D. Knuth, 1974.

Having just gone and read the first section of that paper and done a bit of follow up on Wikipedia, I can now argue back: Knuth was writing at a time when people had only recently realized that programs should be structured as a series of functions and use such (seemingly basic) constructs as for/while loops and if/else blocks.  This was in contrast to the old style in which everything was basically a series of low-level jump instructions.  In this context, he pointed out: (1) programmers are typically poor at guessing which bit of their code is the bottleneck;  (2) generally 97% of the code doesn’t merit any kind of optimization; and (3), in his time, optimizing code usually meant taking something already spaghetti-like and turning it into – umm…- unpalatable spaghetti?

Those three arguments were of course valid, and remain so in many modern-day contexts, however programming has moved on since the 60s and 70s, and the original three arguments do not always hold. (Knuth hoped for a “Utopia 84” or “Newspeak” to appear within ten years of his article – C++ just about made the deadline, but it took a few more years to reach Java, Python, and the later improved versions of C++.)

So, lets talk about scientific analysis – the sort of thing people write in Matlab, Python (i.e. numpy), or some funky C++ library (I’ve never used Fortran so I’ll leave that to one side). In all cases the user is provided with amazingly compact syntax for doing fairly complicated things. For example, in python we can do:

# import stuff from numpy, and newaxis as nax
r = hypot(arange(-10,11)[:, nax], arange(-10,11)[nax, :])

Which means: position the range [-10, -9, ..., 9, 10] in the first two dimensions and then compute the hypotenuse at each point in the full two-dimensional space, giving:


I doubt whether that little snippet would have been a bottleneck in anyone’s analysis, so Knuth would have us ignore it from an optimization point-of-view.  However, if we know how to write and read this code why should we not do it like this?  It probably takes less effort than a pair of repmat‘s (my least favorite Matlab function – use bsxfun) and manual sqrt(x**2 + y**2). In fact, for some numbers you will get better results using hypot because it rearranges the equation to prevent overflow/underflow.  I certainly prefer this one-liner to anything more verbose, because the intention here is to make a “single thing” (by which I roughly mean “a pattern that can easily be understood at a glance with the appropriate plotting command”).  However, despite being concise and fast, I practically never seen this sort of code in scientific analysis, and get complaints when I write it myself.

If this really were a bottleneck, we would want to go further than this one-liner, using one or both of the following common tricks: (a) pre-compute once rather than re-compute every time; (b) take advantage of symmetry to do half, a quarter, or even an eighth of the work, possibly even requiring downstream code to use the minimal slice directly rather than expecting the full 2D array to be in memory.  Both of these optimizations complicate matters a little, for example caching can be dangerous in Numpy if you don’t explicitly set write=false (see setflags*) and you may need to handle arbitrary sizes of array, which could get messy. However, if this truly is taking up a serious percentage of run time it would probably be worth the 5 minutes effort needed to do this given that the speed payoff is as good as guaranteed.

However, to come back to the three points from Knuth….

It is almost certainly true it’s hard to predict the bottlenecks of large programs that involve lots of little bits of data or specialized algorithms. However if we are talking about basic linear algebra or scatter/gather operations (i.e. indexing) then it’s actually pretty easy to get a decent intuition.   Replacing “simple” code, with (“scary”) vectorized commands is almost always going to make that section of code 2-10 times faster and it will usually make it easier to read, so long as you get past the “scariness”.  For situations where things are already part-vectorized, or otherwise more complicated, it may be worth running the profiler to see what’s going on exactly – in modern IDEs, running a profiler requires minimal effort and can provide a wealth of useful information about control flow as well as bottlenecks.  Most people accept that the profiler is a useful tool when you’re really stuck for speed, however I would advocate using it right from the moment that you’ve finished drafting your function (and you have some representative data to run it with).   That is the time when:

  1. You have the best understanding of the science/logic behind the analysis. If you do need to optimize it later on it will require more effort to resurrect all the knowledge/thoughts you had previously.
  2. You understand your reasoning for drafting it a particular way. You may have made fundamental errors, or not landed on the absolutely most intuitive, or simplest, algorithmic form.  Thus, examining the function from a slightly different perspective (i.e. performance) may lead you to make corrections or readability improvements: redundant arithmetic can hurt both performance and readability.
  3. You can discover mistakes without fear of embarrassment or worse. Nobody likes discovering bugs hidden in old code, so optimizing functions months after they were written can be a nerve-wracking undertaking. In other words, doing the optimization early on can be a method of avoiding grief (although it may also be a recipe for allowing bugs to go unchecked in the longer term).
  4. You have the freedom to change exactly what the function does without affecting other code. As soon as you (or worse, other people!) start using the function, you lock yourself into accepting/providing particular inputs and outputs (python’s function call syntax helps but it’s not a magic bullet).  If you look at the function from a performance perspective, you may discover that the function can be made more general without significant performance degradation, or the opposite: made much faster for specific cases.  Either way it’s much easier to establish these facts early on and design with them in mind.
  5. You have the greatest to gain by improving performance. This is perhaps the most important point of all.  Often you will write a function, then gradually construct an analysis pipeline around it over the following days and weeks, and then finally actually run the whole thing on some real data/parameters.  By the time you realize that performance would have been helpful, you’ve already spent several seconds/minutes/hours/days staring at the screen, not doing anything particularly useful.  Also, every time you launch your slow concoction during development you have to work hard to preempt possible bugs so as to avoid wasting large amounts of time. You also have to be more selective in your choice of inputs, which means you might miss interesting results or waste effort thinking through stuff that the computer could be doing for you.

To wrap up then, I would say the following…

Bottlenecks can often be easily guessed. Many non-bottlenecks can be optimized with minimal effort and minimal impact on readability.  The profiler is very easy to use and can quickly help when you can’t guess, or are new to the language.  Optimizing early can lead to writing more interesting analyses further down the line, or free up time and mental effort for doing other stuff.  So, don’t wait to optimize: do it now!

*apparently write checks are implemented rather badly behind the scenes so this isn’t a proper guarantee.

This was my first post on wordpress. It took a lot longer to write than I’d expected and wasn’t particularly well structured.  I need to get better on both fronts if I am to do this regularly! Anyway, if you found it interesting, or want to argue back, please do so in the comments. As I said at the start, stay tuned for actual vecotrization tips.


One thought on “Perfectly-timed optimization

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s