Programming

Thoughts on MVP

We all know that getting to a minimum viable product (MVP) is a race.  It is a race against competition, market need, industry direction, etc, etc etc.  I came to a realization recently that reaching MVP can also be a race against yourself and how long your technological choices hold out.

For example let us say that I have chosen a specific UI framework.  I chose it because it satisfied a good portion of my initial requirements out of the box, was well established, and generally well maintained.  This UI framework saves me a a lot of time and money along with lets me get something up and going quickly.  So now that I’ve selected a UI framework and have done some work, the requirements grow and evolve and the framework begins showing its age.  Assuming that this happens before I reach MVP, I am left with a bit of a problem: reevaluate and possibly retool the system or keep moving forward.  Retooling the system will mean a step backward and slowdown my timeline to market, but continuing means incurring more debt that will have to be recuperated later.  What is the right choice in this situation?  I believe you have to play it by ear, but favor sticking with your choices for as long as possible.

Posted by Chad Dotson in Programming, Software Engineering, Technology, 0 comments

The Python “in” Operator – Theoretical vs Actual Time Complexity

Background

Sometimes we may generate or retrieve a list, set or even dict when creating collection of things that we will be testing against.  Theoretically a set, frozenset or dictionary should be the fastest (and equivalent) forms of storage for this operation.  However, what I’ve found is that on some systems the set is faster and on others dict is faster.  This difference maybe very important if your writing real-time or close to real-time software.  So where am I going with this?

Big-O Notation – Advertised Complexity

Python has published the expected time complexities of their collection types.  I’ve copied the ones for the in operator below.  These Big-O numbers are exactly what you would expect since everything but a list is implemented using a hashing algorithm.  It should be noted, however, that the speed of the set, frozenset, and dict can be compromised if the objects stored do not implement a good hashing algorithm.

Type Average Worst
list O(n)
set O(1) O(n)
frozenset O(1) O(n)
dict O(1) O(n)


More: Python Time Complexity

What I Found

Going back to my statement above, I found that on certain machines, python sets were faster and on some machines python dicts where faster.  I cannot replicate sets being faster in all cases directly so I tried to replicate it with a RHEL 7.1 machine on AWS.  Given that I was at an optimal case for the collection (no collisions), I would have thought that set, frozenset, and dict at least performed on par with each other.  I was surprised to find with the default python interpreter my tests showed that python dicts are actually faster.  So, I reran the tests with the corresponding version of PyPy and found that the expected results hold true and set and frozenset operate at virtually the same speed as dicts.  I suspect the primary reasons for the differences are the compiler used to create the python binaries.  It was interesting however that PyPy performed as expected on all systems.

The Data

I ran the benchmarks on OSX, Ubuntu 14.04, and RHEL 7.1 (Courtesy of AWS Free Tier);  Though, I opted not to record the RHEL results as they are similar to the Ubuntu results.

Benchmarks Fastest % Difference
OSX
Python
list 5.47 150.641
set 0.85 9.877
frozenset 0.85 9.877
dict 0.77 0.77 0.000
PyPy
list 0.34 89.362
set 0.13 0.13 0.000
frozenset 0.13 0.13 0.000
dict 0.13 0.13 0.000
Ubuntu
Python
list 6.07 123.733
set 1.44 0.697
frozenset 1.49 4.110
dict 1.43 1.43 0.000
PyPy
list 0.78 102.913
set 0.25 0.25 0.000
frozenset 0.25 0.25 0.000
dict 0.26 3.922

Recommendations

If you have a need to create a collection to test for existence like in this example; favor set, frozenset or dict whichever makes sense for your situation.  If you are working with a list your given and you want to speedup the system, you can consider changing the list to a set.

The Code

I’ve uploaded all the code to github.  It is available here: https://github.com/chaddotson/container-membership-benchmark/.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Tips, 0 comments

Python Logger

The Scenario

This scenario illustrates two possible mistakes people make when using the python logging module.  Analyze the following code and look for issues.

So what is wrong with that?

First and foremost, the code fails to use the existing Logging.exception function that could and in most cases should be used when logging exceptions.  That function will automatically add all the exception info to the log, meaning that you will have the stack trace!  Secondly, this sample used the string.format function to format the log message for the logging library when the logging library can in fact handle string formatting itself via old style format specifiers.

Fixing it

If I were to ignore the first problem, the following code is what I should have written.  The benefit here is that the formatting is only executed if the log message is to be captured, unlike the first method.

Taking both errors into account, we should have used the exception function instead of the error function on the logger as well as the built in formatting.  Given both of these, the code becomes.

More Data

This scenario led me to quantifying the error in execution time.  The first set of data is related to logging alone; the second set extends to timing the different string formatting options.  As you can see by the data, using the format is a good bit slower than the built-in “old-style” formatting in the logging package.  While it will add up, it isn’t a world ending difference if done on a small scale.  Again, the time difference is largely due to the fact that no formatting takes place unless the message has a high enough level.  This data caused me to extend my study into timing the two different formatting options.  As you can see by the data, the “old style” is marginally slower than the format style.

Comparing old style to new style string formatting

 

In the end

You should use functionality the API gives you.  In most cases, and the case with python, it has been engineered to work, be fast and be maintainable.  For more information on the logging module, check the python docs.  2.7 or 3.5.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Technology, Tips, 0 comments

Not Invented Here, Not Written By Me, and Reinventing The Wheel

Not invented here and not written by me are both driving factors in reinventing the wheel when developing software.

We limit ourselves if we do not build upon the achievements of others. – Chad Dotson

Not Invented Here

I’m sure everyone has encountered developers that would prefer to implement everything themselves instead of using a library.  An example would be not using jQuery or underscore (or comparable libraries) on a web project.

This is a serious problem for several reasons.

  • It needlessly increases development time.
  • It potentially leads to less robust code and/or increased testing time.
  • It potentially leads to less maintainable code.

I’m not saying that libraries should always be preferred over your own code, but they should be strongly considered.  If you choose to re-implement what a library gives you, you should prepare some defensible reasons for not going with the library.

More: Wikipedia

Not Written by Me

This is a more refined, narrower case of Not Invented Here.  Those developers who don’t want to spend the time or have difficulty understanding code written by others often reimplement code because they view it as the simpler solution.  This is a falsity and they hurt their overall code quality and momentum for it.

Some common things you will hear are:

  • I don’t know what that code does.
  • I would spend a shorter amount of time rewriting it.  (Which is most likely a falsehood.)

The Core of the Issue

As I’ve said, I think the core of the issue is that we find it harder to understand what someone else writes vs what we write ourselves.  We must apply programming best practices and resist the urge to reimplement the past.  To grow, we must push past our tendencies and continue to move forward to bigger and better things.

Posted by Chad Dotson in Doing Things Better, Key Concepts, Programming, Software Engineering, 0 comments

C-Style Unions And Python

So, you’re creating a C Union (used to create a variant type) and writing it to a file, socket, etc and want to read that in Python. There are two ways to deal with this.

Assume the following union definition in C

In C, reading the value represented by this is easy.  Since its 4 bytes, you simply read 4 bytes and then reference the appropriate element.  In Python, if your looking for functionality to closely match C, it seems not so straight forward.

struct.pack and struct.unpack

The first thing you try to do is look at the struct module and see if pack and unpack can come close to doing what you want.  The problem with pack and unpack is that it requires a data type.

This works just as well as anything and is completely straightforward, the big problem here is speed.  First, we have to do an if around each call to unpack to get the appropriate option.  Second, its faster to pull in arrays in python than single values.

A ctypes addition to struct.pack and struct.unpack

Using ctypes, you can approach a functionality similar C.  Take the following code for example.

Notice that it is always unpacking the data as an integer into the integer part of the union.  This approach has a few advantages.  One, it functions the same as the C version of the code would.  Two, you can unpack entire arrays at once which can be faster.

Conclusions

The first code sample seems to be the simplest and most straightforward though potentially slower.  However, depending on the situation, you may want an implementation similar to the second.

Posted by Chad Dotson in Programming, 0 comments

Decisions in 1 Pomodori

All programmers suffer from analysis paralysis at some point in their career.  The trick is bringing it to a swift conclusion.  Try this to move forward next time.  Give yourself 1 pomorodi (25 minutes) to analyze the problem.  After that make a decision and go with it.  Whatever falls out is whatever falls out.  At least you were not over-analyzing the problem and getting nowhere.

Posted by Chad Dotson in Doing Things Better, Misc, Programming, 0 comments

On Software Engineering

Spring

Writing from the patio on a much deserved day off. What a day, sunny and 68 this 3rd day of spring. It sometimes makes it real hard to work inside in a windowless box. It’s been a busy day of a different sort, but dang I could get used to it.

Software Engineering Is Consuming

Software Engineering can and is absolutely consuming work.  The short of it is, we like what we do so we tend to focus a lot of attention on it.  We are problem solvers, designers, learners, and the list goes on.  I am of the philosophy that you should find and do something you like because anything else is a waste.  I guess that philosophy has its pros and its cons.  Is it so bad when your day consists of writing code, solving problems, researching, and learning new things?

Achieving Balance

I guess everyone talks about work/life balance and its true; you must always take time for yourself.  Get outside, do some walking.  It will help clear your head and it is a good stress reliever.  Maybe you’ll come back to the task with some fresh ideas and renewed vigor.  The schedule will always be there.

Personally as of April, I will have been on a diet and exercise plan for 2 years.  I’ve lost a lot and still need to lose more.  One of my biggest problems has been shorting myself on time to maintain my walking and weight lifting.  I seem to always get wrapped up in something.

Having a rewarding hobby is probably a good idea.  I’m not talking about coding a side project (don’t we all seem to have an overabundance of those), I’m talking about something else entirely.  My non-programming hobby is photography.  I don’t get to do it much it seems anymore it seems, but I do find winter a dreary time to take photos.  I did have a major accomplishment in this area last fall.  I officially photographed a wedding.  While I was originally scared to take on such a task, the photos turned out wonderfully.  I took the photo below today.

2015_03_23_Spring_Bradford_Pear

Bradford Pear Blooming Spring 2015

 

Posted by Chad Dotson in Hobbies, Key Concepts, Photography, Programming, Ramblings, Software Engineering, 0 comments

Node.js vs Python vs PyPy – A Simple Performance Comparison – Updated

n_queens_graph
n_queens_table

Some History

This is a followup to my original post: Node.js vs Python vs PyPy – A Simple Performance Comparison.  This article corrects a discrepancy caused by a slight difference in the JavaScript implementation which skewed the Node.js results.

The Algorithm

As stated in the previous article, I’ve attempted to implement the same single-thread, brute force, recursive algorithm in many different languages.  There is nothing overly special about this algorithm and I’ve made no attempts to optimize it.

The Findings

Node is fast, very fast.  It easily outperforms any of the other implementations I’ve included in the puzzle’s repository.  As you can see by the included charts, the performance difference between Node.js and out-of-the-box Python is very significant and the difference between it and PyPy while less pronounced is significant.

Special Notes

  • I’ve placed my source on GitHub at the following url: https://github.com/chaddotson/puzzles.  It now contains functional N-Queens puzzle implementations in JavaScript, Python, Lua, Scala, and Ruby.  There is also a version in Rust, but that needs to be updated to the latest syntax before it can be run again.
  • This is just with one type of algorithm, the best solution might and probably does change depending on what type of application you are researching.  For webserver performance, Node.js is slightly better than PyPy running Tornado.
  • This algorithm is a simple brute force algorithm, there are many faster and better ones out there.
  • See the original article for the Node.js vs Python vs PyPy – A Simple Performance Comparison for more details memory performance.
Posted by Chad Dotson in Misc, Programming, Software Engineering, Technology, 5 comments