tips

Python Logging – Best Practices

The python logging module offers a wide variety of logging options and handlers.  One thing missing from the documentation is when to use each level.

A quick foreword

You really should familiarize yourself with the logging package.  How to create new loggers (I find creating them by module very useful).  There are many ways to configure logging, I tend to like dictConfig from logging.config (but start off with basicConfig form logging).

A Word on Optimal Setups

I prefer to setup my logging with each module having its own logger.  This allows me to configure logging levels at a package and/or module level.  I typically do the following in each module to create a logger.

Assuming my package structure consists of the following:

– foo (package)
—– core (module)
—– bar (module)

We can configure varying levels of logging for each element, as seen in the following snippet from a dictConfig.

In this example, the root ( ” ) logger (those not configured by any other settings) reports INFO level and up messages.  With the exception of the bar module, the foo package only reports WARNING level and up messages.  The bar module is set to a more verbose DEBUG level, to show information needed for debugging.

Selecting A Log Message Level

Out of the box, there are six default logging levels recognized by the logging module, most are self-explanatory.  I’ll just make some notes about usage.  (From here on out, I’ll refer to my logging instance as logger.)

For general status messages, you should use logger.info (INFO).  For errors, use either logger.critical (CRITICAL) or logger.error (ERROR).  For all exceptions, use logger.exception (ERROR).  logger.exception will automatically include stack trace information about the exception for you in the log. When you want verbose debugging information, use logging.debug (DEBUG)

In Closing

  • Use the logging module instead of print statements.
  • Always use logger.exception for logging exceptions.
  • Favor logger.debug for verbose log statements.
  • Favor logger.info for most other log statements (with the exception of errors).
  • Don’t forget that each of the logging functions uses C-style formatting.
Posted by Chad Dotson in Doing Things Better, Key Concepts, Programming, Software Engineering, Technology, 0 comments

Authorizing a twitter-bot

So last night I ran into a small issue, how to authorize a twitter bot to use an application without stubbing together a website and logging in with the bot account?  The answer, this little script using twython:

Source: https://gist.github.com/moonmilk/035917e668872013c1bd#gistcomment-1398946

Posted by Chad Dotson in Programming, Programming Live Blog, 0 comments

The Python “in” Operator – Theoretical vs Actual Time Complexity

Background

Sometimes we may generate or retrieve a list, set or even dict when creating collection of things that we will be testing against.  Theoretically a set, frozenset or dictionary should be the fastest (and equivalent) forms of storage for this operation.  However, what I’ve found is that on some systems the set is faster and on others dict is faster.  This difference maybe very important if your writing real-time or close to real-time software.  So where am I going with this?

Big-O Notation – Advertised Complexity

Python has published the expected time complexities of their collection types.  I’ve copied the ones for the in operator below.  These Big-O numbers are exactly what you would expect since everything but a list is implemented using a hashing algorithm.  It should be noted, however, that the speed of the set, frozenset, and dict can be compromised if the objects stored do not implement a good hashing algorithm.

Type Average Worst
list O(n)
set O(1) O(n)
frozenset O(1) O(n)
dict O(1) O(n)


More: Python Time Complexity

What I Found

Going back to my statement above, I found that on certain machines, python sets were faster and on some machines python dicts where faster.  I cannot replicate sets being faster in all cases directly so I tried to replicate it with a RHEL 7.1 machine on AWS.  Given that I was at an optimal case for the collection (no collisions), I would have thought that set, frozenset, and dict at least performed on par with each other.  I was surprised to find with the default python interpreter my tests showed that python dicts are actually faster.  So, I reran the tests with the corresponding version of PyPy and found that the expected results hold true and set and frozenset operate at virtually the same speed as dicts.  I suspect the primary reasons for the differences are the compiler used to create the python binaries.  It was interesting however that PyPy performed as expected on all systems.

The Data

I ran the benchmarks on OSX, Ubuntu 14.04, and RHEL 7.1 (Courtesy of AWS Free Tier);  Though, I opted not to record the RHEL results as they are similar to the Ubuntu results.

Benchmarks Fastest % Difference
OSX
Python
list 5.47 150.641
set 0.85 9.877
frozenset 0.85 9.877
dict 0.77 0.77 0.000
PyPy
list 0.34 89.362
set 0.13 0.13 0.000
frozenset 0.13 0.13 0.000
dict 0.13 0.13 0.000
Ubuntu
Python
list 6.07 123.733
set 1.44 0.697
frozenset 1.49 4.110
dict 1.43 1.43 0.000
PyPy
list 0.78 102.913
set 0.25 0.25 0.000
frozenset 0.25 0.25 0.000
dict 0.26 3.922

Recommendations

If you have a need to create a collection to test for existence like in this example; favor set, frozenset or dict whichever makes sense for your situation.  If you are working with a list your given and you want to speedup the system, you can consider changing the list to a set.

The Code

I’ve uploaded all the code to github.  It is available here: https://github.com/chaddotson/container-membership-benchmark/.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Tips, 0 comments

Python Logger

The Scenario

This scenario illustrates two possible mistakes people make when using the python logging module.  Analyze the following code and look for issues.

So what is wrong with that?

First and foremost, the code fails to use the existing Logging.exception function that could and in most cases should be used when logging exceptions.  That function will automatically add all the exception info to the log, meaning that you will have the stack trace!  Secondly, this sample used the string.format function to format the log message for the logging library when the logging library can in fact handle string formatting itself via old style format specifiers.

Fixing it

If I were to ignore the first problem, the following code is what I should have written.  The benefit here is that the formatting is only executed if the log message is to be captured, unlike the first method.

Taking both errors into account, we should have used the exception function instead of the error function on the logger as well as the built in formatting.  Given both of these, the code becomes.

More Data

This scenario led me to quantifying the error in execution time.  The first set of data is related to logging alone; the second set extends to timing the different string formatting options.  As you can see by the data, using the format is a good bit slower than the built-in “old-style” formatting in the logging package.  While it will add up, it isn’t a world ending difference if done on a small scale.  Again, the time difference is largely due to the fact that no formatting takes place unless the message has a high enough level.  This data caused me to extend my study into timing the two different formatting options.  As you can see by the data, the “old style” is marginally slower than the format style.

Comparing old style to new style string formatting

 

In the end

You should use functionality the API gives you.  In most cases, and the case with python, it has been engineered to work, be fast and be maintainable.  For more information on the logging module, check the python docs.  2.7 or 3.5.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Technology, Tips, 0 comments

Isolate, Understand, Implement

Isolate, understand, implement: three very important things to remember when adding, modifying or replacing features on a project.  The first step is to isolate the feature.  Hopefully, your project is designed in such a way that once the code is isolated it is in a single place.  The next step is understand.  Meaning understand what the code currently does.  Run the unit tests.  Make sure you know without a doubt why the code works and what the code does.  Also understand any effects that your changes might have.  And finally, implement the new code or changes.  During this phase you’ll modify the unit test set as appropriate, make the necessary changes to the code base, and ensure that the application is stable before pushing it back to your repository.

Posted by Chad Dotson in Doing Things Better, Key Concepts, Programming, Software Engineering, 0 comments