Awhile back I had some thoughts on communication. If you’ve ever played World of Tanks Blitz you’d know that basically its a team of tanks against another team of tanks. With the pick up, fast paced nature communication is minimal at best (sometimes limited to a single “<<<<<<<<<” or “>>>>>>>>>” indicating which direction to take the offense). I found that teams that could coordinate with minimal communication, play their tank roles (scouts, mediums, heavies, and destroyers), and move fast could achieve massive overwhelming victories. Something similar is probably true in an agile/teamwork environment. Know your stuff, know your role, take opportunities, work together, succeed.

I was tinkering around with replacing the print statement with the print function in a Python 2 script when I ran across this peculiar oddity.

Notice that sequence is just importing the future print function as different names in each iteration.  The oddity is that the first import fails but the third (which is exactly the same) succeeds after performing the second.

** Note: I don’t know if replacing print with the print from the futures module is a wise thing to do.  I was simply using it while trying out some code. **

Versions tested: Python 2.7.10 (OSX), Python 2.7.6 (Ubuntu 14.04).

Have you encountered the following scenario?

You are trying to solve a problem (or helping solve a problem) and know or at least think you know the solution.  You are in the middle of implementing it when someone else looks at it and says, “why don’t you do it this way, isn’t this way easier/better?”  Taking a step back, you realize that the question not only has merit but is a better and much more obvious solution; you can’t believe you missed it.

What happened?

I think its because you were too close to the problem and had developed a very narrow focus.  That narrow focus prevented you from seeing the better solution.  Perhaps this is even a variation of functional fixedness in that we’ve latched onto an idea of how to solve a problem and our mind’s may not see alternatives easily.

What can we do?

  • Think about the broad (or product) level goals regularly.
  • Entertain questions and/or suggestions from others.
  • Ask: “Is this the best way?”
  • Ask: “Is this the practical way?”
  • Don’t overthink the problem.
  • Get it working then evaluate the solution and/or do a code review!

The python logging module offers a wide variety of logging options and handlers.  One thing missing from the documentation is when to use each level.

A quick foreword

You really should familiarize yourself with the logging package.  How to create new loggers (I find creating them by module very useful).  There are many ways to configure logging, I tend to like dictConfig from logging.config (but start off with basicConfig form logging).

A Word on Optimal Setups

I prefer to setup my logging with each module having its own logger.  This allows me to configure logging levels at a package and/or module level.  I typically do the following in each module to create a logger.

Assuming my package structure consists of the following:

– foo (package)
—– core (module)
—– bar (module)

We can configure varying levels of logging for each element, as seen in the following snippet from a dictConfig.

In this example, the root ( ” ) logger (those not configured by any other settings) reports INFO level and up messages.  With the exception of the bar module, the foo package only reports WARNING level and up messages.  The bar module is set to a more verbose DEBUG level, to show information needed for debugging.

Selecting A Log Message Level

Out of the box, there are six default logging levels recognized by the logging module, most are self-explanatory.  I’ll just make some notes about usage.  (From here on out, I’ll refer to my logging instance as logger.)

For general status messages, you should use logger.info (INFO).  For errors, use either logger.critical (CRITICAL) or logger.error (ERROR).  For all exceptions, use logger.exception (ERROR).  logger.exception will automatically include stack trace information about the exception for you in the log. When you want verbose debugging information, use logging.debug (DEBUG)

In Closing

  • Use the logging module instead of print statements.
  • Always use logger.exception for logging exceptions.
  • Favor logger.debug for verbose log statements.
  • Favor logger.info for most other log statements (with the exception of errors).
  • Don’t forget that each of the logging functions uses C-style formatting.

I’m tinkering with some financial analysis scripts so when I got to looking into some useful python packages, Technical Anaysis Library popped up.  The python bindings require the TA Lib (Technical Analysis Library) which on osx is available via homebrew.  Now, when I originally installed I didn’t want to install it globally so I’ve got the less preferred, local install setup.  This local install results in the following necessary commands to get the pip package to install correctly.

Now that I’ve brew installed TA Lib and set the new include and library path, I can install the python bindings via pip.

 

We all know that getting to a minimum viable product (MVP) is a race.  It is a race against competition, market need, industry direction, etc, etc etc.  I came to a realization recently that reaching MVP can also be a race against yourself and how long your technological choices hold out.

For example let us say that I have chosen a specific UI framework.  I chose it because it satisfied a good portion of my initial requirements out of the box, was well established, and generally well maintained.  This UI framework saves me a a lot of time and money along with lets me get something up and going quickly.  So now that I’ve selected a UI framework and have done some work, the requirements grow and evolve and the framework begins showing its age.  Assuming that this happens before I reach MVP, I am left with a bit of a problem: reevaluate and possibly retool the system or keep moving forward.  Retooling the system will mean a step backward and slowdown my timeline to market, but continuing means incurring more debt that will have to be recuperated later.  What is the right choice in this situation?  I believe you have to play it by ear, but favor sticking with your choices for as long as possible.

Background

Sometimes we may generate or retrieve a list, set or even dict when creating collection of things that we will be testing against.  Theoretically a set, frozenset or dictionary should be the fastest (and equivalent) forms of storage for this operation.  However, what I’ve found is that on some systems the set is faster and on others dict is faster.  This difference maybe very important if your writing real-time or close to real-time software.  So where am I going with this?

Big-O Notation – Advertised Complexity

Python has published the expected time complexities of their collection types.  I’ve copied the ones for the in operator below.  These Big-O numbers are exactly what you would expect since everything but a list is implemented using a hashing algorithm.  It should be noted, however, that the speed of the set, frozenset, and dict can be compromised if the objects stored do not implement a good hashing algorithm.

Type Average Worst
list O(n)
set O(1) O(n)
frozenset O(1) O(n)
dict O(1) O(n)


More: Python Time Complexity

What I Found

Going back to my statement above, I found that on certain machines, python sets were faster and on some machines python dicts where faster.  I cannot replicate sets being faster in all cases directly so I tried to replicate it with a RHEL 7.1 machine on AWS.  Given that I was at an optimal case for the collection (no collisions), I would have thought that set, frozenset, and dict at least performed on par with each other.  I was surprised to find with the default python interpreter my tests showed that python dicts are actually faster.  So, I reran the tests with the corresponding version of PyPy and found that the expected results hold true and set and frozenset operate at virtually the same speed as dicts.  I suspect the primary reasons for the differences are the compiler used to create the python binaries.  It was interesting however that PyPy performed as expected on all systems.

The Data

I ran the benchmarks on OSX, Ubuntu 14.04, and RHEL 7.1 (Courtesy of AWS Free Tier);  Though, I opted not to record the RHEL results as they are similar to the Ubuntu results.

Benchmarks Fastest % Difference
OSX
Python
list 5.47 150.641
set 0.85 9.877
frozenset 0.85 9.877
dict 0.77 0.77 0.000
PyPy
list 0.34 89.362
set 0.13 0.13 0.000
frozenset 0.13 0.13 0.000
dict 0.13 0.13 0.000
Ubuntu
Python
list 6.07 123.733
set 1.44 0.697
frozenset 1.49 4.110
dict 1.43 1.43 0.000
PyPy
list 0.78 102.913
set 0.25 0.25 0.000
frozenset 0.25 0.25 0.000
dict 0.26 3.922

Recommendations

If you have a need to create a collection to test for existence like in this example; favor set, frozenset or dict whichever makes sense for your situation.  If you are working with a list your given and you want to speedup the system, you can consider changing the list to a set.

The Code

I’ve uploaded all the code to github.  It is available here: https://github.com/chaddotson/container-membership-benchmark/.

The Scenario

This scenario illustrates two possible mistakes people make when using the python logging module.  Analyze the following code and look for issues.

So what is wrong with that?

First and foremost, the code fails to use the existing Logging.exception function that could and in most cases should be used when logging exceptions.  That function will automatically add all the exception info to the log, meaning that you will have the stack trace!  Secondly, this sample used the string.format function to format the log message for the logging library when the logging library can in fact handle string formatting itself via old style format specifiers.

Fixing it

If I were to ignore the first problem, the following code is what I should have written.  The benefit here is that the formatting is only executed if the log message is to be captured, unlike the first method.

Taking both errors into account, we should have used the exception function instead of the error function on the logger as well as the built in formatting.  Given both of these, the code becomes.

More Data

This scenario led me to quantifying the error in execution time.  The first set of data is related to logging alone; the second set extends to timing the different string formatting options.  As you can see by the data, using the format is a good bit slower than the built-in “old-style” formatting in the logging package.  While it will add up, it isn’t a world ending difference if done on a small scale.  Again, the time difference is largely due to the fact that no formatting takes place unless the message has a high enough level.  This data caused me to extend my study into timing the two different formatting options.  As you can see by the data, the “old style” is marginally slower than the format style.

Comparing old style to new style string formatting

 

In the end

You should use functionality the API gives you.  In most cases, and the case with python, it has been engineered to work, be fast and be maintainable.  For more information on the logging module, check the python docs.  2.7 or 3.5.