Fredrik Håård's Blaag

@fhaard
I'm a programmer, consultant, developer, occasional teacher and speaker. Among my least disliked programming languages are Python, and a majority of these posts are related to Python in one way or another.
RSS Feed

You deserve practice

I enjoyed The Clean Coder by Uncle Bob, and would recommend it to any serious developer. I agree with almost everything in it, but there is one jarring exception - I disagree strongly with his view that because it's your own responsibility to practice, you should not do it on paid time.

Under the headline "Practice Ethics", he states: "Professional programmers practice on their own time. It is not your employer’s job to help you keep your skills sharp for you [...] Football fans do not (usually) pay to see players run through tires. Concert-goers do not pay to hear musicians play scales. And employers of programmers don’t have to pay you for your practice time."

But the best football clubs do pay players for training time, and the best orchestras pay their musicians to practice. I agree with one thing, and that is that an employer doesn't have to pay you for practice time, but an employer that does not want to pay for programmer training time does not deserve the best programmers.

I believe, that if you have ambitions of being a great programmer you should find an employer that lets you become great. If you want to be among the best, you should find an employer that understands that hiring and keeping the best programmers means that they will need to spend time honing their skills. This is by no means black and white, but there are not that many great programmers out there, and it's a sellers market. The first step in taking responsibility for your own career is to make sure that you have an employer that deserves you.

Besides, a common trait of almost all great - or, indeed, professional - programmers I’ve met is that programming is not their only passion; and most of them work enough that doing a significant amount of training outside of work would kill their ability to pursue other interests in a meaningful way. I know that my own quality of work degrades rapidly when the workload is high enough that I have to cut down on other interests, simply because I need real down time to solve hard problems, and anything related to programming is not down time.

Blaag created 120423 08:42

Version 1.2.0 of hgapi released

hgapi is a pure-Python API for Mercurial, which uses the command-line interface for maximum compatibility. It is tested for Python 2.7 and 3.2.

Version 1.2.0 fixes a few bugs, and allows iterating over a repository as well as using slices ( i.e. repo[0:'tip']) to get a set of changesets. API documentation is also slightly improved.

Special thanks to Nick Jennings and Adam Gomaa who contributed patches.

Blaag created 120413 12:40

Protocol specifications written in Python

This is a writeup of a talk I did recently at Software Passion Summit in Gothenburg, Sweden. For more background info, see the post I did prior to the conference.

Writing a specification in a full-blown programming language like Python has upsides and downsides. On the downside, Python is not designed as a declarative language, so any attempt to make it declarative (apart from just listing native data types) will require some kind of customization and/or tooling to work. On the upside, having a declaration in the language you write your servers in, you can use the specification itself, rather than a generated derivative of that specification, and writing custom - in this case minimal - generators for other languages is simple, since you can you Python introspection to traverse your specification, and the templating logic of your choice to generate source - this makes it possible, for example, to target a J2ME terminal that just won't accept existing solutions, and where dropping a 150K jar file for protocol implementation is not an alternative.

For me, this journey started around 2006 when I started to lose control over protocol documentation and protocol versions for the protocol used between terminals and servers in the fleet management solution Visual Units Logistics. After looking for, and discarding, several existing tools, and after being inspired by the fact that we usually configure Javascript in Javascript, I started to sketch (as in, ink on paper) on what a protocol specification in Python would look like. This is a transcription of what I came up with at the time:

imei = long
log_message = string
timestamp = long
voltage = float
log = Message(imei, timestamp,
           log_message, voltage)
protocol = Protocol(log, ...)
protocol.parse(data)

With this as a target, I created the first version of a protocol implementation. It looked similar to the target version, but suffered from an abundance of repetition:

#protocol.py
LOG = 0x023
ALIVE = 0x021
message = Token('message', 'String', 'X')
timestamp = Token('timestamp', 'long', 'q')
signal = Token('signal', 'short', 'h')
voltage = Token('voltage', 'short', 'h')
msg_log = Message('LOG', LOG, timestamp, signal, voltage)
msg_alive = Message('ALIVE', ALIVE, timestamp)
protocol = Protocol(version=1.0, messages=[msg_log,msg_alive])
#usage
from protocol import protocol
parsed_data = protocol.parse(data)
open('Protocol.java’,'w').write(protocol.java_protocol())

The implementation around this is simple; the Token class knows how to parse a part of a message, the Message class knows which Tokens to use (and in which order), and the Protocol class selects the correct Message instance using a mapping of marker bytes to Message instances.

However, no support is given for handling multiple versions of the protocol, and the amount of name duplication makes it really cumbersome - so I set out to create a better version.

Some things complicated the creation of a better version. The worst problem of them all proved to be me, myself and I. At this time I had used Python for a couple of years, and started to get interested in the more sophisticated tools available. I had just taught myself about metaclasses, and thought they were an ingenious application of object orientation - and having found a shiny new hammer, I was itching to find a nail.

Unfortunately, I had no pressing need for using metaclasses, so I invented one - I wanted to avoid some assignments in the protocol specification, so I used metaclasses to rip out the init (constructor) method and replace it with a version that registered the instance in a global map and then called the original init method. This is wrong in at least three ways - since it was not generic, it could have been done in the init method directly, if it would have been general it would have been a job for a decorator, and it is a really great way to obfuscate the code:

__MSG_MAPPING__ = {}
def msg_initizer(cls, old_init):
    def new_init(self, name, marker, *args):
        __MSG_MAPPING__get(cls, {})[name] = self
        __MSG_MAPPING__[cls][struct.pack("!B", marker)] = self
        old_init(self, name, marker, *args)
    return new_init
class RegisterMeta(type):
    def __new__(cls, name, bases, attrs):
        attrs['__init__'] = msg_initizer(cls,
                                         attrs['__init__'])
        return super(RegisterMeta, cls).__new__(cls,
                                      name, bases, attrs)
class Message(object):
    __metaclass__ = RegisterMeta

This is the kind of code I'm not proud of, by the way. The worst part? It didn't even remove the duplication, although it lowered it somewhat - and the global registration of messages when loading a protocol really messed up any attempt of multiple version support. This was not the only problem; I also went overboard and wanted to support specifying protocol syntax, using a Flow class that defined legal ordering of messages. This might have been a good idea had we actually had any such requirements in our protocols; since they are “authenticate, do anything”, adding support for this just expanded the codebase and made the protocol specification more complex for extremely little gain (especially since we authenticate in different ways depending on the client). Adding insult to injury, this is even more verbose than the very first try.

#In protocol.py
imei = Token('imei', 'long')
message = Token('message', 'String')
timestamp = Token('timestamp', 'long')
signal = Token('signal', 'short')
voltage = Token('voltage', 'short')
auth = Token('auth', 'String')
Markers({'LOG': 0x023,
    'ALIVE': 0x021,
    'AUTH': 0x028})
Message('LOG', imei, timestamp, signal, voltage)
Message('ALIVE', imei, timestamp)
Message('AUTH', imei, timestamp, auth)
Flow([('AUTH'), ('LOG', 'ALIVE')])
#Usage
protocol = Protocol(version=2.0)
parsed_data = protocol.parse(data) #error if not auth parsed

This entire attempt became a warning example - it shows the danger of finding new and interesting technology and applying it before grokking it, and it shows the danger of over engineering and feature creep. Luckily, once I got a good look on what I had created, even me-a-few-years-back could see that this was an abomination, which was subsequently quietly taken out back and put down without even making it as far as integration tests.

Finally, and ongoing, I decided to apply a carefully measured amount of standard library magic to make the specifications more terse, and remove stuff that we did not need. This made the specification look something like this instead:

#In protocol_4.2.py:
#Tokens
t('message', string)
t('timestamp',  i64)
t('signal', i16)
t('voltage', i16)
#Messages
LOG = ('A log message containing some debug info',
         0x023, timestamp, message, signal, voltage)
ALIVE = ('A message to signal that the terminal alive',
     0x021, timestamp)
#Usage
protocols = load_protocols('protocols')
parsed = protocols[4.2].parse(data)
protocols[4.2].write_java() #Writes to Protocol42.java

At one time, it was even terser (as in the earlier blog post), but that version didn’t really pan out, and the version in production is very similar to this one. Name duplication is avoided using two different techniques - the tokens are defined by calling a method t that creates the Token instance and injects it back into the calling namespace using the supplied name:

#In types.py
from inspect import currentframe
def t(name, data_type):
    """Inserts name = (name, data_type) in locals()
    of calling scope"""
    currentframe().f_back.f_locals[name] = (name, data_type)

To some, this may seem like blasphemy, but consider this - the implementation is extremely simple in concept, it gets the work done, and it is easy to explain. Another change is that the messages are created solely by using inspect to extract members of the module that look like messages - name in all caps, and a tuple. Worth noting might be that there was error handling initially, but I removed that to make parsing fail, rather than accept a specification that may or may not have contained errors.

Finally, java source and html documentation is created by traversing the protocol instance, and feeding the information into simple templates - experiments were made using literate programming using ReST to create documentation, but in the end that tended to obfuscate rather than the reverse. This may be an effect of naive implementation, or that the problem does not lend itself well to literate programming, but either way it was not worth it in this case.

There is a working and slightly generalized version available at bitbucket, and it you would like to hear more about this (and more details about Python magic used), you can buy a ticket to EuroPython - you’ll have until Sunday to vote for my proposals (and others).

Blaag created 120329 10:22

PyCon 2012 - the other stuff

I have tried to do a full writeup of my PyCon experience this year, and failed miserably, so this is what I’ll do: This post will focus only on the conference experience - lessons learned, sessions attended, and projects discovered will have their own posts; this is the other stuff.

So what about the conference as a whole? It was, just like Atlanta last year, an overwhelmingly positive experience. This was the first time I volunteered, and I really felt that that was a given win - from getting to have a say in the program by joining the program committee, through doing a session as a session runner and getting to see all the work that goes on behind the scenes, to responding to a just-in-time tweet to join the swag-packing party. Just standing somewhere and looking confused would prompt someone more experienced to help you out, and people were just so genuinely nice. Will definitely do again.

The venue was good, although as others have already remarked, the open spaces were too far away from the main rooms - I believe this made both the BoFs and the hallway track a bit less exciting than last year.

Food was acceptable, and lunch was served in a timely fashion - breakfast was awesome the first day, and good the following days.

The swag was good, apart from the orange bottle opener - you know who you are. Also, so many t-shirts!

(I really need to fix proper tags for posts so that I don't have to hack a post on PyCon into the Python rss feed used by Planet Python...))

Blaag created 120322 21:30

[rant] Dare to show your code

My name is Fredrik, and sometimes I write code I’m not that proud of.

A friend of mine started on a Python project recently, and when I asked him to put it up on Bitbucket his response was an immediate and not-quite-mock “But then people will see my code!”. I believe this fear of showing one’s code is common, and I believe that it is a problem. Not so much for open source, or anything like that, but for the individual - it suggests a belief that your code isn’t good enough, that other people’s code is better, and/or that offering my code up for others to see will lead to rejection and ridicule. I know I was wary before suggesting a patch to Python, because I feared it was not good enough (it was, but the tests weren’t - nobody was mean in telling me they needed work to conform). I had over ten repositories at Bitbucket before I open sourced the first one, and I spent too much time worrying over the autohook source before daring to make it public... for no good reason at all.

Sometimes I’m not proud of my code; I was in a hurry, I was new to the tools or the domain, I was lazy, I did not know better at the time, there were customer demands I could not fulfill in any other way, or any other reason or excuse. Sometimes somebody tells me I should clean up my code - and that is good. Having others critique your code is one of the best ways of getting better, and knowing that others will look at your code will make you (at least it makes me) write better code. This is why code review is such a powerful tool - even if the reviews seldom find serious errors, people tend to write better code when they know somebody will read it right now, as opposed to that someone will be forced to read it when maintaining the code base some time in the future.

Added to this is, as much as we feel bad when showing our own code, so does everybody else, from time to time. Not all code will be perfect - I’d argue that if we spent the time to make all code perfect, we’d never get anything done. Besides, not everyone will agree on what perfect code entails, so it’s a fools errand - someone will always think there are things about your code that are imperfect. And that’s OK. Showing off my code online (some of it really ugly, like the Blaag source code, has netted me all of two immature flames, but also some clever insights, pull requests with bug fixes, and in one case even someone to discuss the code with (thanks Martijn!).

Blaag created 120302 14:01

Python Closures and Decorators (Pt. 2)

Edit: got complaints that code was hard to read, trying out Pygments.

In part 1, we looked at sending functions as arguments to other functions, at nesting functinons, and finally we wrapped a function in another function. We'll begin this part by giving an example implementation on the exercise I gave in part 1:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs):
...     print("Calling %s with arguments: \n\targs: %s\n\tkwargs:%s" % (
...            fn.__name__, args, kwargs))
...     retval = fn(*args, **kwargs)
...     print("%s returning '%s'" % (fn.func_name, retval))
...     return retval
...   fn_wrap.func_name = fn.func_name
...   return fn_wrap
...
>>> def greeter(greeting, what='world'):
...     return "%s %s!" % (greeting, what)
...
>>> greeter = print_call(greeter)
>>> greeter("Hi")
Calling greeter with arguments:
     args: ('Hi',)
     kwargs:{}
greeter returning 'Hi world!'
'Hi world!'
>>> greeter("Hi", what="Python")
Calling greeter with arguments:
     args: ('Hi',)
     kwargs:{'what': 'Python'}
greeter returning 'Hi Python!'
'Hi Python!'
>>>

So, this is at least mildly useful, but it'll get better! You may or may not have heard of closures, and you may have heard any of a large number of defenitions of what a closure is - I won't go into nitpicking, but just say that a closure is a block of code (for example a function) that captures (or closes over) non-local (free) variables. If this is all gibberish to you, you're probably in need of a CS refresher, but fear not - I'll show by example, and the concept is easy enough to understand: a function can reference variables that are defined in the function's enclosing scope.

For example, take a look at this code:

>>> a = 0
>>> def get_a():
...   return a
...
>>> get_a()
0
>>> a = 3
>>> get_a()
3

As you can see, the function get_a can get the value of a, and will be able to read the updated value. However, there is a limitation - a captured variable cannot be written to:

>>> def set_a(val):
...   a = val
...
>>> set_a(4)
>>> a
3

What happened here? Since a closure cannot write to any captured variables, a = val actually writes to a local variable a that shadows the module-level a that we wanted to write to. To get around this limitation (which may or may not be a good idea), we can use a container type:

>>> class A(object): pass
...
>>> a = A()
>>> a.value = 1
>>> def set_a(val):
...   a.value = val
...
>>> a.value
1
>>> set_a(5)
>>> a.value
5

So, with the knowledge that a function captures variables from it's enclosing scope, we're finally approaching something interesting, and we'll start by implementing a partial. A partial is an instance of a function where you have already filled in some or all of the arguments; let's say, for example that you have a session with username and password stored, and a function that queries some backend layer which takes different arguments but always require credentials. Instead of passing the credentials manually every time, we can use a partial to pre-fill those values:

>>> #Our 'backend' function
... def get_stuff(user, pw, stuff_id):
...   """Here we would presumably fetch data using the supplied
...   credentials and id"""
...   print("get_stuff called with user: %s, pw: %s, stuff_id: %s" % (
...         user, pw, stuff_id))
>>> def partial(fn, *args, **kwargs):
...   def fn_part(*fn_args, **fn_kwargs):
...     kwargs.update(fn_kwargs)
...     return fn(*args + fn_args, **kwargs)
...   return fn_part
...
>>> my_stuff = partial(get_stuff, 'myuser', 'mypwd')
>>> my_stuff(3)
get_stuff called with user: myuser, pw: mypwd, stuff_id: 3
>>> my_stuff(67)
get_stuff called with user: myuser, pw: mypwd, stuff_id: 67

Partials can be used in numerous places to remove code duplication where a function is called in different places with the same, or almost the same, arguments. Of course, you don't have to implement it yourself; just do from functools import partial.

Finally, we'll take a look at function decorators (there may be a post on class decorators in the future). A function decorator is (can be implemented as) a function that takes a function as parameter and returns a new function. Sounds familiar? It should, because we've already implemented a working decorator: our print_call function is ready to be used as-is:

>>> @print_call
... def will_be_logged(arg):
...   return arg*5
...
>>> will_be_logged("!")
Calling will_be_logged with arguments:
     args: ('!',)
     kwargs:{}
will_be_logged returning '!!!!!'
'!!!!!'

Using the @-notation is simply a convenient shorthand to doing:

>>> def will_be_logged(arg):
...   return arg*5
...
>>> will_be_logged = print_call(will_be_logged)

But what if we want to be able to parameterize the decorator? In this case, the function used as a decorator will received the arguments, and will be expected to return a function that wraps the decorated function:

>>> def require(role):
...   def wrapper(fn):
...     def new_fn(*args, **kwargs):
...       if not role in kwargs.get('roles', []):
...         print("%s not in %s" % (role, kwargs.get('roles', [])))
...         raise Exception("Unauthorized")
...       return fn(*args, **kwargs)
...     return new_fn
...   return wrapper
...
>>> @require('admin')
... def get_users(**kwargs):
...   return ('Alice', 'Bob')
...
>>> get_users()
admin not in []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in new_fn
Exception: Unauthorized
>>> get_users(roles=['user', 'editor'])
admin not in ['user', 'editor']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in new_fn
Exception: Unauthorized
>>> get_users(roles=['user', 'admin'])
('Alice', 'Bob')

...and there you have it. You are now ready to write decorators, and perhaps use them to write aspect-oriented Python; adding @cache, @trace, @throttle are all trivial (and before you add @cache, do check functools once more if you're using Python 3!).

Blaag created 120301 19:04

Python Closures and Decorators (Pt. 1)

Since I, in retrospect, made the wrong choice when cutting down a Python course to four hours and messed up the decorator exercise, I promised the attendants that I'd make a post about closures and decorators and explain it better - this is my attempt to do so.

Functions are objects, too. In fact, in Python they are First Class Objects - that is, they can be handled like any other object with no special restrictions. This gives us some interesting options, and I'll try to move through them from the bottom up.

A very basic case of using the fact that functions are objects is to use them as you would a function pointer in C; pass it into another function that will use it. To illustrate this, we'll take a look at the implementation of a repeat function - that is, a function that accepts another function as argument together with a number, and then calls the passed function the specified number of times:

>>> #A very simple function
>>> def greeter():
...   print("Hello")
...
>>> #An implementation of a repeat function
>>> def repeat(fn, times):
...   for i in range(times):
...     fn()
...
>>> repeat(greeter, 3)
Hello
Hello
Hello
>>>

This pattern is used in a large number of ways - passing a comparison function to a sorting algorithm, passing a decoder function to a parser, and in general specializing the behaviour of a function, or passing a specific parts of a job to be done into a function that abstracts the work flow (i.e. sort() knows how to sort lists, compare() knows how to compare elements).

Functions can also be declared in the body of another function, which gives us another important tool. In the most basic case, this can be used to "hide" utility functions in the scope of the function that uses them:

>>> def print_integers(values):
...   def is_integer(value):
...     try:
...       return value == int(value)
...     except:
...       return False
...   for v in values:
...     if is_integer(v):
...       print(v)
...
>>> print_integers([1,2,3,"4", "parrot", 3.14])
1
2
3

This may be useful, but is hardly in itself a very powerful tool. Compared with the fact that functions can be passed as arguments however, we can add behaviours to function after they are constructed, by wrapping them in another function. A simple example would be to add a trace output to a function:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs): #take any arguments
...     print("Calling %s" % (fn.func_name))
...     return fn(*args, **kwargs) #pass any arguments to fn()
...   return fn_wrap
...
>>> greeter = print_call(greeter) #wrap greeter
>>> repeat(greeter, 3)
Calling fn_wrap
Hello
Calling fn_wrap
Hello
Calling fn_wrap
Hello
>>>
>>> greeter.func_name
'fn_wrap'

As you can see, we can replace the greeter function with a new function that uses print to log the call, and then calls the original function. As seen on the last two rows of the example, the function name of the function reflects that it has been replaced, which may or may not be what we wanted. If we want to wrap a function while keeping the original name, we can do so by adding a row to our print_call function:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs): #take any arguments
...     print("Calling %s" % (fn.func_name))
...     return fn(*args, **kwargs) #pass any arguments to fn()
...   fn_wrap.func_name = fn.func_name #Copy the original name
...   return fn_wrap

Since this is rapidly turning into a very long post, I'll stop here and return tomorrow with part two, where we'll look at closures, partials, and (finally) decorators.

Until then, if this is all new to you, use print_call as a base to create a function that will print function name and arguments passed before calling the wrapped function, and the function name and return value before returning.

Update: part 2

Blaag created 120229 20:36

Blaag using Genshi, greedy bloggers adding ads, and demographics. Also ranting.

(There will be very little techical content here; if that's what you are looking for, move along)

So, I updated Blaag to use Genshi for templating (still in it's own branch), which was a pretty pleasant experience; I cleared out some of the worst code from blaag.py and ended up with a single html template instead of a host of snippets.

I also added a left column to the template (should probably make it optional) so that only blog links live in the right column. This made me go over the 960px with that I used to have, but a quick look at my visitor statistics claim that most of the visitors won't be adversly affected by that - I still intend to add a lite/mobile alternate style, since I do want anyone to be able to read the blog.

On the topic of visitors, you are a curious bunch, but perhaps what could be expected of readers of a tech blog: 50% use Chrome, 21% Firefox, 12% Safari, 6% Android browser, 6% 'Mozilla Compatible', 2% Opera, 1.5% IE, and then a gaggle of uncommon browsers. 31% use Windows, 30% OSX, 18% Linux of some sort, and the rest are mainly mobile OSes. About 10% of users block Google Analytics, accoring to the server logs.

Also on the topic of visitors, I thought that hey, I have a lot of visiors, I should be able to get some cash out of the hours spent and maybe make hosting break-even (rather, move haard.se to a host of it's own)! I figure that any readers that take offense at ads are in the 10% that block Analytics, and will block the ads, so nothing is stopping me.

Well, I tried to read up a bit on ad programs and ads on tech blogs, but there's not a whole lot of signal in all the noise that turns up; mainly it advertising for ad programs, and 'this works/does not work, because I say so'. Figures, or even mildly solid-looking arguments are missing. Right now I tossed a AdSense add there just because it was quick and I could force a look and feel; I suspect it will attract about zero clicks. I could allow a banner, but AFAIK I cannot then preemtively block ugly, animated or just wrong ads. In addition, it seems that the AdSense ads are synchronous, slowing down the site to show ads which is wrong in itself.; nobody visits the site to view ads, after all.

So, I'll probably have to look for another program (or just not have ads), but I have no idea what that would be, since everyone is so damn secret about their figures. Right now belive I have about 50K visitors a month who does not block javascript in general or Google in particular (50K since 8th of february, when the blog went live for 'real'), but I have no idea whatsovever where that puts me in the great scheme of things since people don't seem to talk about their visitor numbers, and I have no idea where I should look for ads that a) are at least somewhat interesting to developers b) does not slow down page loading c) Pays enough to make it worth the overhead of maintaining them for a non-webmaster.

Meh.

Blaag created 120227 17:53

Your favourite programming language is not good enough

I was blown away by the amount of response - mostly positive - on my Python is important post. However, a lot of the replies, both positive and... slightly less positive, really highlighted an issue I have with how a lot developers seem to approach programming languages: the search for the Perfect Language to Love and Protect. Why are so many developers so very emotional when it comes to their favourite programming language? Considering that no language can (yet) magically translate the perfect idea in your head into machine code, all of them exist on a scale of badness - they all limit you more than your own thoughts or the hardware does.

I believe that the primary reason people feel the need to vehemently defend a particular language is that they are lazy. Of course, good programmers are always lazy (why else automate?), but this is a specific and very bad laziness - being too lazy to learn. If my favourite language is better than anything else, or maybe at least just as good as anything else, I don't have to spend time and effort learning new languages.

The main problem with this is not only that you won't find the perfect language, but that when you're only comfortable in one or two languages, the way you solve problems become limited by whats possible to do in those languages - and if the languages you know are similar and from the same paradigm, the problem gets worse.

When you choose a language to solve a problem, by all means, use the language you feel you will solve it best in - the more powerful, more productive, most comfortable, the one with the most libraries... but if you want to be a serious programmer or developer, rather than someone who dabbles a bit in programming, you need to learn new languages, and you need to stop believing that you found that one language that is better than the rest. All programming languages have made trade-offs, and none is perfect. I would argue that some languages are better than others, but no language is the best at everything, and no language got everything right. Python has it's own problems (its not that it is dynamically typed), so has the different Lisp dialects (its not that it has too many parenthesis), and so has Haskell (its the indisputable fact that it is weird*).

Learn new languages. Learn not to be partisan and defend 'your' language against any criticism. If you haven't already, read Structure and Interpretation of Computer Programs, and learn some form of Lisp - it will make you see and feel the limitations of other languages, and the pain will make you a better programmer, whatever language you use.

* No, I'm not being serious. Haskell is next on my list of languages to learn.

Blaag created 120224 16:30

Python API to git: gitapi

Train rides can be good - if not creativity, then at least boredom-induced productivity. I had planned to make a hgapi fork that worked against git instead of Mercurial, and during the ride back from holding a Python workshop in Malmö (a three-hour trip back to Karlskrona; I still have another hour to go...), I finally did. gitapi is born, and even though its in it's infancy, it supports a large number of common operations. Fair warning - I basically just swiched hg->git and then fixed the test cases that still made sense, so there's bound to be some kinks left to iron out. Either way, the test suite passes on both Python 2.7 and 3.2, so there's something.

Blaag created 120221 21:24

Explaining comprehensions to programmers

For the first year or two programming Python, I never used list comprehensions (at the time, those were the only comprehensions). I read about them, I kinda figured out how they worked, and then I stuck to map() and filter(), which I understood. Looking back, I think that this has a lot to do with the fact that explanations of comprehensions are done using their origin - mathematics - rather than the domain we use them in - programming.

A quick duckduckgo search tells me this is still the case - Wikipedia asks us to consider something like this: \(S = \{2 \cdot x | x \epsilon N | x > 3\}\) , and other sources also seem to start out with ‘this is how its done in math, so...’ (a notable exception is the tutorial on python.org).

When talking to programmers, I’d like to explain comprehensions differently, because not all programmers have a background in mathematics. For a programmer, a list comprehension is simply a for loops for constructing lists, using a more declarative notation than your usual for loop. For those of us used to map() and filter(), list comprehensions are both of those as well.

Consider:

def loop(my_list):
  result = []
  for x in my_list:
      if x > 3:
          result.append(x*2)
  return result

Ever written code like this? This is code that explicitly states which steps should be taken to construct your list; but you don’t have to - you can instead state what you want:

def compr(my_list):
  return [x*2 for x in my_list if x > 3]

This translates to give me value*2 for every value in my_list, but only if that value is more than three. Note also that this expression does the work of both map (multiply by two) and filter (take only values that are less than two). The general case would be [add something to the list for each value in an iterable, optionally only if a condition is True for that value]

Comprehensions also work nested - consider this simple but ugly code:

def create_matrix_loop(size, default):
  new_matrix = []
  for y in range(size):
    row = []
    for x in range(size):
      row.append(default)
    new_matrix.append(row)
return new_matrix

Sample output:

>create_matrix_loop(3, None)
[[None, None, None],
 [None, None, None],
 [None, None, None]]

Since comprehensions can be nested, this can be replaced with:

def create_matrix_compr(size, default):
  return [[default for x in range(size)] for x in range(size)]

As an added bonus, when we don’t tell the compiler how we want to do something, but rather what we want done, it can generate better - faster - bytecode for us. The loop version of create_matrix is translated into 35 bytecode instructions, and the version using a list comprehension is only 20 (try import dis; dis.dis(func) to see what func looks like in bytecode) and in reality, you will often avoid making a function at all when using comprehensions since they’re terse enough on their own, making this difference even bigger. Timing the implementations, the difference is evident:

>timeit -n100 create_matrix_loop(1000, None)
100 loops, best of 3: 113 ms per loop
>timeit -n100 create_matrix_compr(1000, None)
100 loops, best of 3: 49.1 ms per loop

That's right: less code, declarative syntax, and faster execution! (Note: I used iPython when creating and timing the examples - it's awesome and you should try it)

Blaag created 120216 11:10

Why Python is important for you

I believe that Python is important for software development. While there are more powerful languages (e.g. Lisp), faster languages (e.g. C), more used languages (e.g. Java), and weirder languages (e.g. Haskell), Python gets a lot of different things right, and right in a combination that no other language I know of has done so far.

It recognises that you’ll spend a lot more time reading code than writing it, and focuses on guiding developers to write readable code. It’s possible to write obfuscated code in Python, but the easiest way to write the code (assuming you know Python) is almost always a way that is reasonable terse, and more importantly: code that clearly signals intent. If you know Python, you can work with almost any Python with little effort. Even libraries that add “magic” functionality can be written in perfectly readable Python (compare this to understanding the implementation of a framework such as Spring in Java).

Python also acknowledges that speed of development is important. Readable and terse code is part of this, and so is access to powerful constructs that avoid tedious repetition of code. Maintainability also ties into this - LoC may be a all but useless metric, but it does say something about how much code you have to scan, read and/or understand to troubleshoot problems or tweak behaviours.

This speed of development, the ease with which a programmer of other languages can pick up basic Python skills, and the huge standard library is key to another area where Python excels - toolmaking. Any project of size will have tasks to automate, and automating them in Python is in my experience orders of magnitude faster than using more mainstream languages - in fact, that was how I started out with Python, creating a tool to automate configuring Rational Purify for a project where it before was such a chore that it was never run (and memory leaks were not fixed). I’ve since created tools to extract information from ticket systems and presenting them in a way useful to the team, tools to check poms in a Maven project, Trac integration, custom monitoring tools... and a whole lot more. All of those tools have been quick to implement, saved a lot of time, and several of them has later been patched and updated by people with no Python background - without breaking.

That building custom tools is easy hints at another strength - building and maintaining custom software is easy, period. This is why, while the quite huge Django framework might be the most famous Python web framework, there is also a host of successful small and micro-frameworks. When working in a powerful programming language with a wide array of standard and third-party libraries, you often don’t need to accept the trade-offs that are necessary when using any large off-the-shelf framework. This means that you can build exactly the software your customers want, rather than telling them that ”this is how it’s done, sorry”. To me, this is a huge difference. I feel ashamed when I have to tell a customer that no, sorry, this seems like a simple requirement, but the framework we use makes it impossible or prohibitively expensive to implement. Whenever this happens, you have failed. Writing software that fits into the customer’s model rather than into a framework is important, and I for one feel that a lot of developers today has lost sight of that simple fact. A lot of programmers now spend more time being configurators of frameworks and makíng excuses for their shortcomings, rather than actual programming.

Finally, if you’re a boss-wo/man or general manager, using Python has a final benefit - Python programmers run into less frustration*, which makes them happier, and even more productive!

(*may not be true when installing source-distributed C extensions on Windows)

Blaag created 120211 10:26

Using Python to get rid of .doc

I'll be appearing att Software Passion to speak about using Python for protocol specifications, instead of using an external document to write the specification, and then try to implement it from there (or, perhaps more common, implementing it and then trying to keep the document up-to-date).

A while ago at Visual Units, the situation was this: There was a protocol to transfer data over TCP from fleet management black boxes running J2ME to a server running Python, which then stored that data so interesting things could be done with it. Accompanying the protocol was a ever-slightly-out-of-date protocol specification, and a client implementation in Python used for testing the server.

This means that we had four different implementations of the protocol: one in Java, two in Python, and one in English. If one of those was not updated when the others were, the system was no longer consistent, and might break in interesting ways.

Since this created a lot of work for me, I set out to change things. First, I searched for viable existing solutions, but the need to keep the protocol compact (telematics data transfer is expensive), and J2ME support meant I did not find anything to use off the shelf.

Instead I started to implement my own solution, with a vision that I would implement the protocol once, and use it everywhere - Java, Python, and English. In the end, using a couple of hundred of rows of Python, we can now specify a protocol thus:

message = string
timestamp = i64
timediff = i32
ping = ("A ping, with a time and message",
         timestamp, message)
pong = ("A pong, with message, timestamp and perceived lag",
        timestamp, timediff, message)

...and from this, we create Java source code for the terminals, the Python clients and servers use it directly when packing and parsing messages, and the documentation for the poor souls who might want to read English instead of Python is generated.

Want to know how this was made possible, see some code, and point and laugh at my miserable attempts that failed? Want to know why meta-classes were absolutely vital - or not? Register for Software Passion where I'll be talking about this - if you use the promontion code 'BLAAG' when registering, you'll even get a 10% discount!

Blaag created 120207 18:36

Mercurial in Python 3: promoting hgapi

When I took a look at the python.org Py3k poll, I saw that Mercurial was on the top list of things people wanted ported (though far behind the likes of Django). Now, I don't know why others want to tie into Mercurial from code, but if a little performance overhead from using the CLI isn't critical - such as when writing hooks, or just integrating version control in some tool or another, you might want to consider hgapi.

hgapi uses only the command line interface, and was created to be able to release autohook under a more permissive license than the GPL - and it's tested against both Python 2.7 and Python 3.2. It now supports most (for a given computation of 'most') operations in Mercurial, and as a bonus there are no open feature requests - so if there is something you miss, this is the time to request it!

Also, don't forget to register for PyCon, early bird rates until 25/1!

Blaag created 120119 17:00

What's the point of properties in Python?

A few days ago I was asked by a collegaue what the point of properties in Python is. After all, writing properties is as much text as writing getters and setters, and they don't really add any functionality except from not having to write '()' on access.

On the surface, this argument holds as we can see by comparing a simple class implemented with getters and setters, and with properties.

Implemented with getters and setters:

>>> class GetSet(object):
...   x = 0
...   def set_x(self, x):
...     self.x = x
...   def get_x(self):
...     return self.x
...
>>> getset = GetSet()
>>> getset.set_x(3)
>>> getset.get_x()
3

And implemented with properties:

>>> class Props(object):
...   _x = 0
...   @property
...   def x(self):
...     return self._x
...   @x.setter
...   def x(self, x):
...     self._x = x
...
>>> props = Props()
>>> props.x = 5
>>> props.x
5

The point

In fact, we've gone from 196 to 208 chars in this simple use case - so why would we use properties at all?

The answer is, that in this use case we would not. In fact, we would write thus:

>>> class MyClass(object):
...   x = 0
...
>>> my = MyClass()
>>> my.x = 4
>>> my.x
4

'But!', I can hear you scream, 'there's no encapsulation!'. What will we do if we need to control access to x, make it read-only or do something else to it? Won't we have to refactor everything to the getters and setters that we avoided?

No - we just switch to the property version, add whatever we want, and have not changed the interface one iota! The great thing about properties is not that they replace getters and setters, its that you don't have to write them to future-proof your code. You can start out by writing the simplest implementation imaginable, and if you later need to change the implementation you can still do so without changing the interface. Neat, huh?

Blaag created 120115 13:03

hgapi 1.1.0

As a belated christmas gift, I just released hgapi version 1.1.0. New since 1.0.1 is support for hg status, merge and revert. This means that I right now have no firm plans for the future, as the tool does what I need it to do. If you have other requirements, add them to the issue tracker.

Blaag created 111229 13:21

Jenkins

Recently, I wanted to migrate some lightweight services from a virtual host to an account at Webfaction, since running a wiki/issue tracker (Trac) and CI server (Jenkins) for a couple of low-volume projects really shouldn’t take a whole machine of it’s own. Or should it?

This is when I realized that Jenkins is heavyweight in a world of cloud and shared hosts. I already kinda knew, since I've been administrating a Jenkins installation that claims several gigs of RAM, but that's with over thirty Maven projects and a pretty high load.

Firing Jenkins up and configuring two (Ant) projects, it claims ~150Mb RAM - for doing nothing. On a shared host, that’s unacceptable. Under the old RAM limits on Webfaction, it’d be impossible to run, now its just claiming 3/5 of my total memory allotment.

So yesterday I set up continuous integration using a Python script that runs the build, determines fail or success, and publishes the log and/or artifacts; it took a bit less than an hour to get working from scratch (admittedly using my own Mercurial lib for integration).

Now I'm thinking of maybe creating something useful out of this. Right now, I publish logs as static web pages, but I could just post them as wiki pages to Trac via RPC. That would allow logs to tie into the ticket system and source browser, and I could show build status right there on the Kanban-ish-board next to the tickets.

I've got no firm design done yet, but I'm thinking about what the requirements for a minimalistic CI tool should be:

  • Take no resources apart from disk space when idle
  • Be able to publish fail and success logs in a useful format
    • It should be trivial (for real) to implement support for new targets for result publishing
  • Ability to limit resource usage (number of concurrent builds)
  • Jobs can trigger each other
  • more?
Blaag created 111228 13:21

Problem exist between chair and keyboard

It was pointed out to me that a entry was missing. My initial reaction was "No, its not!", but then had to confess that yes, an entry that had been posted was missing. This was because I've developed (and written) for blaag on several computers, and I'd accidently made a closed head tip and lost an update. Maybe I need to add some functionality to make sure that existing posts do not disappear.

Blaag created 111228 13:04

Autohook updated

When I linked to autohook the other day, I was not prepared that somebody would actually try it, and tell me that it did not work.

So, eating my own dogfood I set up the released version of autohook, to run Blaag generation, and realized that it did, indeed, not work. This is fixed now in version 1.1.0, which also does away with using "if __name__ ..." and instead uses setuptools magic to create runnable scripts.

In addition, I realized that setting up hooks for a single repo was kind of a pain, so I simplified the configuration for that use case.

Blaag created 111221 13:48

Tools for better Python

TL;DR
pip install virtualenv pylint ipython autohook

Tools I use every day to write better Python, to make it more fun, or just easier:

  • A good editor: I prefer Emacs, you might like something else, but trust me on this - it’ll be a humongous project that forces you to use a full IDE. If you stay clear of large web frameworks, you might never need it. I started out using PyDev since I was used to Eclipse, but now I just don’t think its worth the complexity and overhead.
  • virtualenv: I use my virtualenvs for more than just Python these days, and setting a new environment up is the first thing I do when starting a new project.
  • pylint: Not only does it tell you what you might want to fix in your code, it tells you if your code gets better or worse. The more unsure you are, the more you should use pylint.
  • ipython: While REPL is nice, iPython is truly awesome for testing and prototyping. For me, its gradually replacing bash as well.
  • unittest/unittest2: Python comes with built-in unit test support - use it (where appropriate)!

These are the tools I use in almost any project, and recently I’ve added one of my own:

  • autohook, to run pylint/unit tests on commit to Mercurial

Finally, I’d like to point anyone starting out with Python to the excellent introduction set up by Mir Nazim at his site.

Blaag created 111220 09:13

On useless testing

All testing is not valuable. There. I said it.

If you take a look at the source of Blaag, you might notice a certain lack of tests. No unit test, no tests at all in fact. Does this mean I do not believe in unit tests, TDD and testing in general? No! If you take a look at hgapi, for example, I wrote almost all code using TDD since that was the only way to know I got it right.

When starting on Blaag (which did at the time not have a name), I began by creating testblaag.py, writing import unittest - and then I froze. I had no idea what a test for Blaag would look like. Everything Blaag does, is glue code. It fetches data, feeds it to docutils, collects some additional data from Mercurial, creates documents using string.Template and a RSS feed using PyRSS2Gen.

There are some utility functions (implemented as functions or not) that I could have created unit tests for, but what information would I draw from writing a test for sum([int(i) for i in hgdate_string.split()])? I write this code for me, and for me this code is obvious. So how do I know it works? I test it. Manually, since generating the entire html source is the only way for me to know that Blaag works as I intend it to work.

Whereas when writing hgapi, I wrote a tool for others to use and adapt, and a tool that I could not easily look at and see if the result was correct, Blaag is easy to verify: I look at the rendered site, in my browser. If it does not look OK, I have a bug. If it works, I have NO bugs. I might have potential bugs, like the fact that the -f option is currently required when updating, but if the code generates the result I want, consistently, and in reasonable time, Blaag performs perfectly.

Any test would simply be more code that did not add information or value - and there is a name for that kind of code: bloat. And while in the case of Blaag this is easy to see, I believe that more care should be taken generally when writing tests, just as when writing functionality - the question you should always ask yourself when producing code is simply: what value does this code add? If you cannot answer that question, you probably should not write the code, whether it's a test or not.

Because code unwritten never breaks.

Blaag created 111218 17:46

Styling

New style thanks to @markuseliasson - anything ugly is because of my adaptions, anything looking great is his work. Since the main parts, are done (still to be done: Mercurial hook, optional disqus/analytics), the next post may actually have other content than the Blaag itself!

Blaag created 111218 11:03

RSS support is go!

There's no reason to roll my own when the guys over at Dalke Scientific Software have created the awesome PyRSS2Gen. It's everything I want from a utility library - it's simple, documented by example, it does one thing, and it does it well. Thanks to them, this Blaag now sports a RSS feed!

Also some bug fixes and minor tweaks done.

Blaag created 111217 22:36

Progress is made

Timestamps from Mercurial is now used to insert creation and modification times for posts in Blaag. I've also done some code cleanup and documentation, although there is still work to be done.

Next up is RSS or Atom feed generation, just have to decide if I should use an existing library or just generate the XML. Knowing myself, I'll probably roll my own, yet again.

Blaag created 111217 20:17

It's alive!

After almost more than four hours of grueling work, my blogging platform codenamed "blaag" works.

I created blaag since I've been thinking about blogging, but didn't like the blogging platforms I found, because they were made of bloat with a little functionality hidden deep within. I did, however, like the idea behind hgblog - especially that it's based around generating the blog from rst using Mercurial hooks, allowing blogging from the comfort of Emacs. In the end, I decided to roll my own.

The code for blaag and the entries for this instance of blaag coexists at Bitbucket, so you can view the source code and the source for the entries themselves there.

The goals of blaag are, in order of priority:
  1. Nicer color scheme
  2. Time/Datestamp posts
  3. RSS support
  4. Add example of Mercurial hooks to repo

I guess documentation should be somewhere up top as well. Suggestions on what color scheme I should use are welcome! (assuming the Disqus integration works)

Blaag created 111216 22:08


Page created using blaag and abusing docutils. RSS Feed generated using PyRSS2Gen.