Programming

Python CSV Module Oddity

A Python Oddity

I was using Python to encode a CSV file using a custom dialect recently when I noticed something odd. I noticed that the csv writer class takes an optional argument that enables you to change the line terminator.  That’s ok, however, the csv reader class does not honor the argument. So, through the default api, you can create CSV that you cannot read back in with the default api. According to the documentation, this applies to Python 2.7 through 3.7.  It also probably applies to versions < 2.7 but that documentation is no longer online.

Python.org Documentation

A Simple Script

I wrote the following simple script to illustrate this oddity.  Basically it has a list of lists that it converts to CSV and back using various line terminators, if the output differs from the source, the difference is displayed.

Results

As you can see from the results here; with the exception of ‘\r\n’, all the various line terminators  failed.

Closing Thoughts

So, why would you want to specify the line terminators with the CSV module? There are probably only a handful of reasons, my only reason would be a custom dialect where I wanted to ensure that there were no carriage returns or line feeds in.  99.9% of the time, you want ‘\r\n’.

Posted by Chad Dotson in Programming, Software Engineering, 0 comments

Thoughts On Team Communication

Awhile back I had some thoughts on communication. If you’ve ever played World of Tanks Blitz you’d know that basically its a team of tanks against another team of tanks. With the pick up, fast paced nature communication is minimal at best (sometimes limited to a single “<<<<<<<<<” or “>>>>>>>>>” indicating which direction to take the offense). I found that teams that could coordinate with minimal communication, play their tank roles (scouts, mediums, heavies, and destroyers), and move fast could achieve massive overwhelming victories. Something similar is probably true in an agile/teamwork environment. Know your stuff, know your role, take opportunities, work together, succeed.

Posted by Chad Dotson in Doing Things Better, Software Engineering, 0 comments

Odd Behavior in Python 2.7

I was tinkering around with replacing the print statement with the print function in a Python 2 script when I ran across this peculiar oddity.

Notice that sequence is just importing the future print function as different names in each iteration.  The oddity is that the first import fails but the third (which is exactly the same) succeeds after performing the second.

** Note: I don’t know if replacing print with the print from the futures module is a wise thing to do.  I was simply using it while trying out some code. **

Versions tested: Python 2.7.10 (OSX), Python 2.7.6 (Ubuntu 14.04).

Posted by Chad Dotson in Programming, 0 comments

You’re Too Close

Have you encountered the following scenario?

You are trying to solve a problem (or helping solve a problem) and know or at least think you know the solution.  You are in the middle of implementing it when someone else looks at it and says, “why don’t you do it this way, isn’t this way easier/better?”  Taking a step back, you realize that the question not only has merit but is a better and much more obvious solution; you can’t believe you missed it.

What happened?

I think its because you were too close to the problem and had developed a very narrow focus.  That narrow focus prevented you from seeing the better solution.  Perhaps this is even a variation of functional fixedness in that we’ve latched onto an idea of how to solve a problem and our mind’s may not see alternatives easily.

What can we do?

  • Think about the broad (or product) level goals regularly.
  • Entertain questions and/or suggestions from others.
  • Ask: “Is this the best way?”
  • Ask: “Is this the practical way?”
  • Don’t overthink the problem.
  • Get it working then evaluate the solution and/or do a code review!
Posted by Chad Dotson in Doing Things Better, Programming, Tips, 0 comments

Python Logging – Best Practices

The python logging module offers a wide variety of logging options and handlers.  One thing missing from the documentation is when to use each level.

A quick foreword

You really should familiarize yourself with the logging package.  How to create new loggers (I find creating them by module very useful).  There are many ways to configure logging, I tend to like dictConfig from logging.config (but start off with basicConfig form logging).

A Word on Optimal Setups

I prefer to setup my logging with each module having its own logger.  This allows me to configure logging levels at a package and/or module level.  I typically do the following in each module to create a logger.

Assuming my package structure consists of the following:

– foo (package)
—– core (module)
—– bar (module)

We can configure varying levels of logging for each element, as seen in the following snippet from a dictConfig.

In this example, the root ( ” ) logger (those not configured by any other settings) reports INFO level and up messages.  With the exception of the bar module, the foo package only reports WARNING level and up messages.  The bar module is set to a more verbose DEBUG level, to show information needed for debugging.

Selecting A Log Message Level

Out of the box, there are six default logging levels recognized by the logging module, most are self-explanatory.  I’ll just make some notes about usage.  (From here on out, I’ll refer to my logging instance as logger.)

For general status messages, you should use logger.info (INFO).  For errors, use either logger.critical (CRITICAL) or logger.error (ERROR).  For all exceptions, use logger.exception (ERROR).  logger.exception will automatically include stack trace information about the exception for you in the log. When you want verbose debugging information, use logging.debug (DEBUG)

In Closing

  • Use the logging module instead of print statements.
  • Always use logger.exception for logging exceptions.
  • Favor logger.debug for verbose log statements.
  • Favor logger.info for most other log statements (with the exception of errors).
  • Don’t forget that each of the logging functions uses C-style formatting.
Posted by Chad Dotson in Doing Things Better, Key Concepts, Programming, Software Engineering, Technology, 0 comments

Installing Technical Analysis Library for Python

I’m tinkering with some financial analysis scripts so when I got to looking into some useful python packages, Technical Anaysis Library popped up.  The python bindings require the TA Lib (Technical Analysis Library) which on osx is available via homebrew.  Now, when I originally installed I didn’t want to install it globally so I’ve got the less preferred, local install setup.  This local install results in the following necessary commands to get the pip package to install correctly.

Now that I’ve brew installed TA Lib and set the new include and library path, I can install the python bindings via pip.

 

Posted by Chad Dotson in Programming, 2 comments

Thoughts on MVP

We all know that getting to a minimum viable product (MVP) is a race.  It is a race against competition, market need, industry direction, etc, etc etc.  I came to a realization recently that reaching MVP can also be a race against yourself and how long your technological choices hold out.

For example let us say that I have chosen a specific UI framework.  I chose it because it satisfied a good portion of my initial requirements out of the box, was well established, and generally well maintained.  This UI framework saves me a a lot of time and money along with lets me get something up and going quickly.  So now that I’ve selected a UI framework and have done some work, the requirements grow and evolve and the framework begins showing its age.  Assuming that this happens before I reach MVP, I am left with a bit of a problem: reevaluate and possibly retool the system or keep moving forward.  Retooling the system will mean a step backward and slowdown my timeline to market, but continuing means incurring more debt that will have to be recuperated later.  What is the right choice in this situation?  I believe you have to play it by ear, but favor sticking with your choices for as long as possible.

Posted by Chad Dotson in Programming, Software Engineering, Technology, 0 comments

The Python “in” Operator – Theoretical vs Actual Time Complexity

Background

Sometimes we may generate or retrieve a list, set or even dict when creating collection of things that we will be testing against.  Theoretically a set, frozenset or dictionary should be the fastest (and equivalent) forms of storage for this operation.  However, what I’ve found is that on some systems the set is faster and on others dict is faster.  This difference maybe very important if your writing real-time or close to real-time software.  So where am I going with this?

Big-O Notation – Advertised Complexity

Python has published the expected time complexities of their collection types.  I’ve copied the ones for the in operator below.  These Big-O numbers are exactly what you would expect since everything but a list is implemented using a hashing algorithm.  It should be noted, however, that the speed of the set, frozenset, and dict can be compromised if the objects stored do not implement a good hashing algorithm.

Type Average Worst
list O(n)
set O(1) O(n)
frozenset O(1) O(n)
dict O(1) O(n)


More: Python Time Complexity

What I Found

Going back to my statement above, I found that on certain machines, python sets were faster and on some machines python dicts where faster.  I cannot replicate sets being faster in all cases directly so I tried to replicate it with a RHEL 7.1 machine on AWS.  Given that I was at an optimal case for the collection (no collisions), I would have thought that set, frozenset, and dict at least performed on par with each other.  I was surprised to find with the default python interpreter my tests showed that python dicts are actually faster.  So, I reran the tests with the corresponding version of PyPy and found that the expected results hold true and set and frozenset operate at virtually the same speed as dicts.  I suspect the primary reasons for the differences are the compiler used to create the python binaries.  It was interesting however that PyPy performed as expected on all systems.

The Data

I ran the benchmarks on OSX, Ubuntu 14.04, and RHEL 7.1 (Courtesy of AWS Free Tier);  Though, I opted not to record the RHEL results as they are similar to the Ubuntu results.

Benchmarks Fastest % Difference
OSX
Python
list 5.47 150.641
set 0.85 9.877
frozenset 0.85 9.877
dict 0.77 0.77 0.000
PyPy
list 0.34 89.362
set 0.13 0.13 0.000
frozenset 0.13 0.13 0.000
dict 0.13 0.13 0.000
Ubuntu
Python
list 6.07 123.733
set 1.44 0.697
frozenset 1.49 4.110
dict 1.43 1.43 0.000
PyPy
list 0.78 102.913
set 0.25 0.25 0.000
frozenset 0.25 0.25 0.000
dict 0.26 3.922

Recommendations

If you have a need to create a collection to test for existence like in this example; favor set, frozenset or dict whichever makes sense for your situation.  If you are working with a list your given and you want to speedup the system, you can consider changing the list to a set.

The Code

I’ve uploaded all the code to github.  It is available here: https://github.com/chaddotson/container-membership-benchmark/.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Tips, 0 comments