software engineering

Working with bytes in Python 3

Background

Sometimes you find yourself needing to work at the byte-level in an application you are working on. I feel that in Python there are not enough examples of how to do this. There is also a lot of potential to over-complicate the solution.

This Example

I plan to cover several aspects of working with bytes in this example. I’ll cover working with the struct package, the bytearray built-in and the ctypes module.

The Code

Output

A statement on copying

Be careful with slicing a python bytearray, bytes or array.array. Slicing creates a copy and can impact the performance of your application. There is a better way; enter memoryview. Memoryview works with anything that implements the Python Buffer Protocol and makes slicing very efficient. Slicing a memoryview will result in another memoryview, not a copy of the bytes represented.

Extra Reading

Posted by Chad Dotson in Programming, Software Engineering, 0 comments

Thoughts On Team Communication

Awhile back I had some thoughts on communication. If you’ve ever played World of Tanks Blitz you’d know that basically its a team of tanks against another team of tanks. With the pick up, fast paced nature communication is minimal at best (sometimes limited to a single “<<<<<<<<<” or “>>>>>>>>>” indicating which direction to take the offense). I found that teams that could coordinate with minimal communication, play their tank roles (scouts, mediums, heavies, and destroyers), and move fast could achieve massive overwhelming victories. Something similar is probably true in an agile/teamwork environment. Know your stuff, know your role, take opportunities, work together, succeed.

Posted by Chad Dotson in Doing Things Better, Software Engineering, 0 comments

You’re Too Close

Have you encountered the following scenario?

You are trying to solve a problem (or helping solve a problem) and know or at least think you know the solution.  You are in the middle of implementing it when someone else looks at it and says, “why don’t you do it this way, isn’t this way easier/better?”  Taking a step back, you realize that the question not only has merit but is a better and much more obvious solution; you can’t believe you missed it.

What happened?

I think its because you were too close to the problem and had developed a very narrow focus.  That narrow focus prevented you from seeing the better solution.  Perhaps this is even a variation of functional fixedness in that we’ve latched onto an idea of how to solve a problem and our mind’s may not see alternatives easily.

What can we do?

  • Think about the broad (or product) level goals regularly.
  • Entertain questions and/or suggestions from others.
  • Ask: “Is this the best way?”
  • Ask: “Is this the practical way?”
  • Don’t overthink the problem.
  • Get it working then evaluate the solution and/or do a code review!
Posted by Chad Dotson in Doing Things Better, Programming, Tips, 0 comments

The Python “in” Operator – Theoretical vs Actual Time Complexity

Background

Sometimes we may generate or retrieve a list, set or even dict when creating collection of things that we will be testing against.  Theoretically a set, frozenset or dictionary should be the fastest (and equivalent) forms of storage for this operation.  However, what I’ve found is that on some systems the set is faster and on others dict is faster.  This difference maybe very important if your writing real-time or close to real-time software.  So where am I going with this?

Big-O Notation – Advertised Complexity

Python has published the expected time complexities of their collection types.  I’ve copied the ones for the in operator below.  These Big-O numbers are exactly what you would expect since everything but a list is implemented using a hashing algorithm.  It should be noted, however, that the speed of the set, frozenset, and dict can be compromised if the objects stored do not implement a good hashing algorithm.

Type Average Worst
list O(n)
set O(1) O(n)
frozenset O(1) O(n)
dict O(1) O(n)


More: Python Time Complexity

What I Found

Going back to my statement above, I found that on certain machines, python sets were faster and on some machines python dicts where faster.  I cannot replicate sets being faster in all cases directly so I tried to replicate it with a RHEL 7.1 machine on AWS.  Given that I was at an optimal case for the collection (no collisions), I would have thought that set, frozenset, and dict at least performed on par with each other.  I was surprised to find with the default python interpreter my tests showed that python dicts are actually faster.  So, I reran the tests with the corresponding version of PyPy and found that the expected results hold true and set and frozenset operate at virtually the same speed as dicts.  I suspect the primary reasons for the differences are the compiler used to create the python binaries.  It was interesting however that PyPy performed as expected on all systems.

The Data

I ran the benchmarks on OSX, Ubuntu 14.04, and RHEL 7.1 (Courtesy of AWS Free Tier);  Though, I opted not to record the RHEL results as they are similar to the Ubuntu results.

Benchmarks Fastest % Difference
OSX
Python
list 5.47 150.641
set 0.85 9.877
frozenset 0.85 9.877
dict 0.77 0.77 0.000
PyPy
list 0.34 89.362
set 0.13 0.13 0.000
frozenset 0.13 0.13 0.000
dict 0.13 0.13 0.000
Ubuntu
Python
list 6.07 123.733
set 1.44 0.697
frozenset 1.49 4.110
dict 1.43 1.43 0.000
PyPy
list 0.78 102.913
set 0.25 0.25 0.000
frozenset 0.25 0.25 0.000
dict 0.26 3.922

Recommendations

If you have a need to create a collection to test for existence like in this example; favor set, frozenset or dict whichever makes sense for your situation.  If you are working with a list your given and you want to speedup the system, you can consider changing the list to a set.

The Code

I’ve uploaded all the code to github.  It is available here: https://github.com/chaddotson/container-membership-benchmark/.

Posted by Chad Dotson in Doing Things Better, Programming, Software Engineering, Tips, 0 comments

Not Invented Here, Not Written By Me, and Reinventing The Wheel

Not invented here and not written by me are both driving factors in reinventing the wheel when developing software.

We limit ourselves if we do not build upon the achievements of others. – Chad Dotson

Not Invented Here

I’m sure everyone has encountered developers that would prefer to implement everything themselves instead of using a library.  An example would be not using jQuery or underscore (or comparable libraries) on a web project.

This is a serious problem for several reasons.

  • It needlessly increases development time.
  • It potentially leads to less robust code and/or increased testing time.
  • It potentially leads to less maintainable code.

I’m not saying that libraries should always be preferred over your own code, but they should be strongly considered.  If you choose to re-implement what a library gives you, you should prepare some defensible reasons for not going with the library.

More: Wikipedia

Not Written by Me

This is a more refined, narrower case of Not Invented Here.  Those developers who don’t want to spend the time or have difficulty understanding code written by others often reimplement code because they view it as the simpler solution.  This is a falsity and they hurt their overall code quality and momentum for it.

Some common things you will hear are:

  • I don’t know what that code does.
  • I would spend a shorter amount of time rewriting it.  (Which is most likely a falsehood.)

The Core of the Issue

As I’ve said, I think the core of the issue is that we find it harder to understand what someone else writes vs what we write ourselves.  We must apply programming best practices and resist the urge to reimplement the past.  To grow, we must push past our tendencies and continue to move forward to bigger and better things.

Posted by Chad Dotson in Doing Things Better, Key Concepts, Programming, Software Engineering, 0 comments

C-Style Unions And Python

So, you’re creating a C Union (used to create a variant type) and writing it to a file, socket, etc and want to read that in Python. There are two ways to deal with this.

Assume the following union definition in C

In C, reading the value represented by this is easy.  Since its 4 bytes, you simply read 4 bytes and then reference the appropriate element.  In Python, if your looking for functionality to closely match C, it seems not so straight forward.

struct.pack and struct.unpack

The first thing you try to do is look at the struct module and see if pack and unpack can come close to doing what you want.  The problem with pack and unpack is that it requires a data type.

This works just as well as anything and is completely straightforward, the big problem here is speed.  First, we have to do an if around each call to unpack to get the appropriate option.  Second, its faster to pull in arrays in python than single values.

A ctypes addition to struct.pack and struct.unpack

Using ctypes, you can approach a functionality similar C.  Take the following code for example.

Notice that it is always unpacking the data as an integer into the integer part of the union.  This approach has a few advantages.  One, it functions the same as the C version of the code would.  Two, you can unpack entire arrays at once which can be faster.

Conclusions

The first code sample seems to be the simplest and most straightforward though potentially slower.  However, depending on the situation, you may want an implementation similar to the second.

Posted by Chad Dotson in Programming, 0 comments

Becoming an Entrepreneur as a Software Engineer Vol 2

This is part two of my discussion and thoughts on seeking to become an entrepreneur as a software engineer.  These are my current thoughts on the process and how to achieve my overall goals in becoming an entrepreneur.  This entry centers on the belief that becoming an entrepreneur occurs in several distinct phases.  Currently, I have identified three core stages.  During each of these phases our rolls and responsibilities change and grow drastically.

Stage 1: Working for someone else

This stage is the simple “do work” stage.  We work to fulfill someone elses vision, we work to complete their goals and bring life to their ideas.  We work to solve their problems.  Since we are, at our core, this state is the default for most workers in industry.

Stage 2: Startup

This stage where we work to get our company off the ground.  We are working to fulfill our own vision.  We have identified a product or service area and are actively working to produce something.  We may or may not have a small team but at worst case we are the pitchman, sales rep, accountant, architect and coder all in one.  We are still the problem solvers, just with more hats to wear.

Stage 3: Liftoff

In the other stages we still participated in the day to day work being done, in this one we have progressed to something else.  What that may be is up to the structure of the company.  We have at the very least transitioned from problem solver at the code level to problem solver at the company level.  We are actively deciding the direction of our company and product.  We transitioned to a form of problem creator for the people in our company.

Posted by Chad Dotson in Programming, Software Engineering, Work, 0 comments

On Software Engineering

Spring

Writing from the patio on a much deserved day off. What a day, sunny and 68 this 3rd day of spring. It sometimes makes it real hard to work inside in a windowless box. It’s been a busy day of a different sort, but dang I could get used to it.

Software Engineering Is Consuming

Software Engineering can and is absolutely consuming work.  The short of it is, we like what we do so we tend to focus a lot of attention on it.  We are problem solvers, designers, learners, and the list goes on.  I am of the philosophy that you should find and do something you like because anything else is a waste.  I guess that philosophy has its pros and its cons.  Is it so bad when your day consists of writing code, solving problems, researching, and learning new things?

Achieving Balance

I guess everyone talks about work/life balance and its true; you must always take time for yourself.  Get outside, do some walking.  It will help clear your head and it is a good stress reliever.  Maybe you’ll come back to the task with some fresh ideas and renewed vigor.  The schedule will always be there.

Personally as of April, I will have been on a diet and exercise plan for 2 years.  I’ve lost a lot and still need to lose more.  One of my biggest problems has been shorting myself on time to maintain my walking and weight lifting.  I seem to always get wrapped up in something.

Having a rewarding hobby is probably a good idea.  I’m not talking about coding a side project (don’t we all seem to have an overabundance of those), I’m talking about something else entirely.  My non-programming hobby is photography.  I don’t get to do it much it seems anymore it seems, but I do find winter a dreary time to take photos.  I did have a major accomplishment in this area last fall.  I officially photographed a wedding.  While I was originally scared to take on such a task, the photos turned out wonderfully.  I took the photo below today.

2015_03_23_Spring_Bradford_Pear

Bradford Pear Blooming Spring 2015

 

Posted by Chad Dotson in Hobbies, Key Concepts, Photography, Programming, Ramblings, Software Engineering, 0 comments