Fredrik Håård's Blaag

@fhaard
I'm a programmer, consultant, developer, occasional teacher and speaker. Among my least disliked programming languages are Python, and a majority of these posts are related to Python in one way or another.
RSS Feed

Embedding Jython in Java Applications

This article originally appeared on the Smartbear blog. If you like live presentation better than text, you can check out the Jython talk I did at EuroPython 2014.

Jython is an implementation of the Python programming language in Java, running on the Java Virtual Machine (JVM), as opposed to the standard implementation of Python which is implemented in C (CPython). Python developers can use it to take advantage of the vast libraries available to the Java community, which means adding a bit of Java integration in the application.

The reasons for running Jython instead of the default C implementation of Python or just sticking to Java when using the JVM differ from project to project. Personally, I have been using Jython to do rapid prototyping when Java implementations would have become too cumbersome, and to create flexible testbeds for Java applications which would be very hard to accomplish in Java. In addition, Jython can be a powerful way to add scripting capabilities to your Java application.

Getting started

First off, you need a JDK for the Java compiler and a copy of the Jython standalone JAR. There are also installers available for Jython, and it can be installed using many package managers; but for setting up your first development environment it’s a good idea to download the standalone binary to learn how everything fits together.

The simplest way to get the standalone JAR is to download it from the Maven Central

If you use Maven for your builds you can of course add Jython directly to your dependencies - the important part is to have the JAR on your classpath when compiling and running your code.

Once you have access to javac and the Jython JAR file, you can create a minimal Java file to test embedding Jython:

import org.python.util.PythonInterpreter;
import org.python.core.*;
public class SimpleExample {
 public static void main(String []args) throws PyException
 {
     PythonInterpreter pi = new PythonInterpreter();
     pi.set(integer, new PyInteger(42));
     pi.exec(square = integer*integer);
     PyInteger square = (PyInteger)pi.get(square);
     System.out.println(square:  + square.asInt());
 }
}

We instantiate a PythonInterpreter object which we then use to access Jython functionality. Note that we use PyInteger objects when passing and retrieving data - this tells Jython what data type our data should be converted into.

This code also shows some of the ways to interact with the interpreter: set and get to define and get the value of variables, and exec to execute a statement, akin to entering it in the REPL (Read Eval Print Loop) and pressing Enter.

To run this code we need to compile it using javac:

> javac -cp jython.jar SimpleExample.java

And then we can run the resulting file using

> java cp jython.jar:. SimpleExample square: 1764

Shortcutting the Edit-Compile-Test Loop

With the Jython example above, you need to recompile the Java file every time you change the Jython code since it’s embedded in the Java source. That removes one of the benefits of Python over Java, namely the shorter feedback loop Python developers appreciate, because Python does not need to be compiled ahead-of-time.

To run the code without recompiling, you need to create a Python module, and use the PythonInterpreter to import it and run functions from it.

The Python module looks like any other Python module. I created a file called pymodule.py which implements a square() function:

def square(value):
   return value*value

This function can then be executed either by creating a string that executes it, or by retrieving a pointer to the function and calling its __call__ method with the correct parameters:

import org.python.util.PythonInterpreter;
import org.python.core.*;
public class ImportExample {
   public static void main(String [] args) throws PyException
   {
       PythonInterpreter pi = new PythonInterpreter();
       pi.exec(from pymodule import square);
       pi.set(integer, new PyInteger(42));
       pi.exec(result = square(integer));
       pi.exec(print(result));
       PyInteger result = (PyInteger)pi.get(result);
       System.out.println(result: + result.asInt());
       PyFunction pf = (PyFunction)pi.get(square);
       System.out.println(pf.__call__(new PyInteger(5)));
   }
}

Now you can run the Java file, edit the Python file, and run it again without having to recompile (as long as the function signature does not change).

This has obvious benefits. You can now edit, run, re-edit, and re-run without rebuilding your project, even if you normally have a complicated tool-chain set up. It’s also helpful if you need to test changes on a target system where you cannot easily redeploy binaries. You can go farther with this, if even restarting a service is too expensive: Add a call to interpreter.exec(“reload(pymodule)”); somewhere where it can be triggered at will, allowing you to reload the module without restarting the system.

Duck-typing and Overloading

Even though the above method works, it is convoluted. It forces us to handle the complexities of translating between Java’s static typing and Python’s dynamic typing at the call site. Using Jython in this way may be enough if all you want is a quick test run, but for code that is intended to outlive the day, I strongly recommend using some kind of wrapping to make your Pythonic interface look like proper Java code. Non-ideomatic code is seldom a good idea, since any future reader of the code has to deal with the cognitive dissonance it causes, making understanding the code that much harder.

Using a wrapping class lets us use Java’s method overloading to hide the fact that we need to call our method using different types of arguments. For the square() function, we implement one version for integer values, and another for doubles; because Python has no double data type, we use PyFloat instead:

import org.python.util.PythonInterpreter;
import org.python.core.*;
public class CleanImportExample {
   public class PyModule {
       private PythonInterpreter interpreter;
       private PyFunction py_square;
       public PyModule() {
           this.interpreter = new PythonInterpreter();;
           this.interpreter.exec(from pymodule import square);
           this.py_square = (PyFunction)this.interpreter.get(square);
       }
       public int square(int val) {
           return py_square.__call__(new PyInteger(val)).asInt();
       }
       public double square(double val) {
           return py_square.__call__(new PyFloat(val)).asDouble();
       }
   }
   public void run() {
       PyModule module = new PyModule();
       System.out.println(module.square(2));
       System.out.println(module.square(2.2));
   }
   public static void main(String [] args) throws PyException {
       new CleanImportExample().run();
   }
}

Here, we instantiate the PythonInterpreter in the wrapper class. This means that the class’ users need to know nothing about Jython; but it also means that we may end up with several instances of the interpreter, which is probably not what we want.

Generally, you can use a singleton pattern to create your PythonInterpreter, as long as your Python modules are stateless. One way to implement this pattern is to have a class dedicated to holding a shared PythonInterpreter:

public final class SharedPythonInterpreter {
    public static final PythonInterpreter interpreter = \
                                      new PythonInterpreter();
}

Explicitly naming it Shared makes it easier to remember not to manipulate state in ways that may break, and any class can now fetch an interpreter using SharedPythonInterpreter.interpreter. Of course, there are cases where you want to keep state in your interpreter; in those cases I recommend instantiating it in the wrapper constructor, or passing it into the constructor if it should be shared among several classes.

The final code, also adding crude namespacing to the import statement so that we won’t collide with any other imports named square, then looks like this:

import org.python.util.PythonInterpreter;
import org.python.core.*;
public class FinalExample {
   public class PyModule {
       private PythonInterpreter interpreter = \
                                SharedPythonInterpreter.interpreter;
       private PyFunction py_square;
       public PyModule() {
           this.interpreter.exec(
             from pymodule import square as PyModuleSquare);
           this.py_square = \
              (PyFunction)this.interpreter.get(PyModuleSquare);
       }
       public int square(int val) {
           return py_square.__call__(new PyInteger(val)).asInt();
       }
       public double square(double val) {
           return py_square.__call__(new PyFloat(val)).asDouble();
       }
   }
   public void run() {
       PyModule module = new PyModule();
       System.out.println(module.square(2));
       System.out.println(module.square(2.2));
   }
   public static void main(String [] args) throws PyException {
       new FinalExample().run();
   }
}

Performance

Go into this with your eyes open. Jython performance suffers in comparison to pure Java code. Jython code also often suffers compared to CPython code, either because CPython spends a lot of time actually running C extensions (at C speeds) or because the memory consumption, as well as the startup time (mainly interesting for short-running programs) is significantly lower than for Java.

The one case where Jython may reliably outperform CPython is when doing purely CPU-bound tasks in a threaded environment, since the lack of a GIL on the JVM means that a computer’s cores may be fully utilized. Still, if performance is the main issue, Jython may just not be a very good fit. Politics

In addition to performance, another word of warning may be in order: Introducing Jython into your Java stack is very easy, but you are still introducing a new programming language as well as new dependencies and increasing the complexity of your stack. In many cases doing this is fine, or even welcome. But to do it without signoff from the appropriate stakeholders is not likely to make you any friends, especially if you are working in a project or at a company that lacks strong Python knowledge.

Finding a new language added “under the radar” in a core product may very well be what turns someone against Python forever (“Python? That unmaintainable garbage XYZ snuck in?”) So spend a bit of time ensuring that other people on your team are on board with the idea and are prepared to take the effort to learn Python.

Final words

If you do have permission to use Jython, you just may start to see use cases everywhere; there are a lot of things that would take hundreds of lines of Java that are trivial when you have access to Python and its standard library. I’ve found that to be the case especially when doing operating system integration or automation on Unix systems (a traditional strong point for Python), or doing low-level socket programming; using Jython allows for some dramatic simplifications and productivity gains. So go forth and make the Java world a bit more Pythonic!

Blaag created 140815 17:38

Call for Proposals: PyCon Sweden 2014!

That's right, PyCon Sweden 2014 will take place in Stockholm on May 20-21, and you should be there!

Even better, you should propose a talk on anything related to Python, and/or see if your company would not like to sponsor the event - this is a chance to reach 250 developers over two days you don't want to miss.

This event will be the first national Python conference in Sweden - a country that has hosted two Europython and has vibrant user groups and meetups, but for some reason has not been able to pull together a national PyCon until now.

So join us in this exciting new endeavor, and if you're located in Sweden and interested in Python, it's free and as easy as filling in your name to join the Python Sverige association and get involved - we'll have an annual meeting in conjunction with PyCon Sweden, and we'll need voulenteers for the conference.

Let's make this awesome!

Blaag created 140220 14:17

Basic Python course in Karlskona

I will be giving a course in basic Python in Karlskrona on March 27.

This is a course for those who already know programming, and would like to learn Python from the ground up, or for the developer who has done some work in Python but wants a broader knowledge and foundation.

During the course we will work with a strong focus on practical knowledge and learning by doing, so that attendees can work independently with Python after the course. A large amount of exercises are built to give the oppurtunity to use test-driven development to explore Python, and discussion of the exercises gives a deeper understanding of the oppurtunities that the language offers.

The couse is a full-day course, with lunch and coffe included. The attendees are required to have basic programming and computer knowledge, and to bring their own laptop to be able to complete exercises.

If interested, you can read more on my homepage, or go directly to registration (n.b. - the course will be held in Swedish!)

Blaag created 140217 13:48

Embedding Jython in Java Applications

I just had an article on Jython embedding appear over at Smartbear.

Blaag created 140102 21:33

I accidentally the server

Some of you may have noticed that BitSyncHub (and my company homepage) went down for almost three days this week.

Now, I'm proud of my skills building highly available, fault-tolerant, state-of-the art and bleeding edge systems, deployed on OS, vendor and geographically redundant infrastructure, that never goes down. Sometimes a failover may take a short while, or parts of a system may get bogged down and queues build up, but the system, while on it's knees, may never fail.

So how, do you ask, does your own homepage and the single service hosted there go down for several days?

Well, let's just say all systems I build have a single point of faliure in the actual execution stage - me. When I set up the homepage for my new company, I just tossed something up right there on a virtual host; installed lighttpd and emacs, created the index page, done. When I added BitSyncHub, it was for my own use, so again, I just installed uWSGI and Celery, created the service, and then, as an afterthough, I made it public so that others could use it.

Since this was all installed on a OpenVZ server, backed up and secure, I thought that I was fine - nothing mission critical, almost no users, and if anything happened, there was always the backup.

The backup kept me secure for anything except for meatware errors.

When I noticed a spike in usage of BitSyncHub, I figured I should perhaps secure the service against extended downtime - just because a service is free it shouln't be unreliable. I was thinking that I might set a new server up and try out Docker at the same time. Let's enter my mind at the time:

"So, I'll fire up a new OpenVZ instance and... oh, right, wrong kernel version, I need a VMWare instace to be able to choose kernel. Dum-de-dum - let's remove the new server. Click-click-click. There, all gon... hm. It's still there, huh. [SUDDEN SENSE OF DOOM] Eh - where's my production server?

Yeah, I removed the server, and with it I also removed the backups. Suddenly, the homepage, BitSyncHub, and the machinery it had been running on was gone - forcing me to recreate it all from scratch.

So if you find any bugs that suddenly appeared in the BitSyncHub service, drop me a mail and I'll fix it. I'll be over here contemplating the fact that if you're considering yourself Hot Stuff when it comes to high availability and resilience when working at customer projects, maybe you should apply that to your own products as well.

Blaag created 131213 15:43

BitSynchHub now supports git, gitapi released, new hgapi version

Since I got several requests for BitSyncHub to support BitBucket Git repository synching to GitHub, I went ahead and added the functionality. The service will dedtect the appropriate repository type, and push specified branches - although the source branch will be ignored for now, so a branch speficiation of 'foo:bar' will simply push 'bar'.

To make this happen, I finally had to bring gitapi a bit closer to completion, so I released the first version to PyPi for general consumption as well.

To top it all off, Jan Williems of elevenbits has added push/pull functionality and done a general cleanup of the hgapi codebase, so version 1.7.1 just went up to PyPi. This release is all Jan's, I've done no work at all except for uploading his work.

Blaag created 130912 20:39

Synchronize Bitbucket to Github automatically

Introducing BitSyncHub

Since I'm an automation nut, when I found Travis CI, I was understandably excited - automatic running of my testcases for hgapi from the repository as opposed to a pre-push hook (as I have had it set up since the beginning of time) would avoid the oh-so embarrassing mistakes of forgetting to add a new file to the repository and having a non-working version in the repo. I just have to set up some service to synch to the GitHub mirror and all will... be... well?

Turns out there was no such service. A hundred advices on how to mirror using push-hooks in your local repository, but since I don't always commit from the same computer, I would need to keep all instances (including future) set up properly, and never again could I be a tad lazy and accept a pull request instead of pushing it from my local repo. This, to me, is not an acceptable state of affairs.

So last week I spent a couple of hours setting up a new service, dubbed BitSyncHub, that will accept POST requests from Bitbucket and synchronize a (Mercurial) repository with it's Github mirror. It is set up using UWSGI, hgapi with hg-git, and Celery for job control. It's a bit rough in that it does not report errors (since it does not run synchronously), and always pushes to Github using the same certificate and user, but I've not been able to break it (recently), and it only requires a one-time setup and it will keep your branches in synch!

Blaag created 130724 20:13

See you in Florence this summer?

I'll be in Florence for EuroPython 2013 and do (more or less) a follow-up to the training session I held last year - a very hands-on venture into Python lanugage and standard library features that will allow you to implement your bad ideas in awesome hacks and good ideas with beautiful magic. This is how fun we had last year!

/static/ep2012dcsl.jpg

Ok, so they're all looking at their screens, but that's kind of the point with a training session in my opinion, hands on keyboards as much as possible.

In other news, I have created a gist with a cleaned up version of my fetch/unpack/csvparse code incorporating some of the suggestions I got here and on Reddit.

Blaag created 130403 05:50

Batteries included: Download, unzip and parse in 13 lines

The other day I needed to download some zip files, unpack them, parse the CSV files in them, and return the data as dicts. I did the very same thing a couple of years ago, and although the source is lost, I recall having a Python (2.4?) script of about two screens to do the download - so a hundred lines. When re-implementing the solution now that I know Python and the standard library better, I ended up with 12 lines written in just a few minutes - edited for blogging clarity it clocks in at 13 lines:

import zipfile, urllib, csv
def get_items(url):
  zip, headers = urllib.urlretrieve(url)
  with zipfile.ZipFile(zip) as zf:
    csvfiles = [name for name in zf.namelist()
                 if name.endswith('.csv')]
    for filename in csvfiles:
      with zf.open(filename) as source:
        reader = csv.DictReader([line.decode('iso-8859-1')
                                  for line in source])
        for item in reader:
          yield item
  os.unlink(zip)

As trivial as it is, I think it is a nice example of just how much you can do with very little (coding) effort.

Edit: I created a gist with a cleaned up version using codecs.getreader. I'll be leaving this version as it is though.

Blaag created 130331 10:12

WGS-84 distance calculations at the speed of C

When we started out doing fleet management at Visual Units, one thing was really hard to get right - distance calculations. There was no end of information available, but most-to-all of it was on a level of mathematics far beyond a poor developer who feels that anything beyond discrete mathematics and basic geometry and statistics really should be somebody else's problem. The implementations that could be found were closed-source licensed version we really could not afford at that stage.

For a while we got by using a solution that relied on having a variant of Lambert conformal conic projection coordinates - it was sufficiently exact if not perfect, and our maps used the same projection, so it worked - although there was the added burden of transforming our stored (WGS-84) coordinates to Lambert every time we needed calculations done. A couple of years ago, however, we switched to Google Maps API and so we really had no use for Lambert - and increased load and precision demands made using the current solution a worse and worse choice.

Enter Chris Veness. Or rather, enter his implementation of the Vincenty inverse formula (pdf). Even though the math is beyond me, porting the Javascript implementation to Python was straightforward, and some testing showed that the result was both faster and had better precision than the previous solution.

Fast-forward to a few months ago, suddenly the performance is starting to look like something that could become a problem. We have many reasons for doing distance calculations, and while the batch jobs were not a problem, any amount of time that can be shaved off user-initiated actions is welcome.

So, I thought to myself, I've ported it once, how hard can it be to do it again? After all, when raw speed becomes the issue, the Python programmer reaches for C. Porting it was once again straightforward, mapping the Python function

def distance(x1, y1, x2, y2):
    ...

into

const double distance(const double x1, const double y1,
                      const double x2, const double y2)
{
    ...

The resulting C code is almost identical to he Python (and Javascript) implementations but runs about 6 times faster than the Python implementation. Allowing batch submission of calculations instead of calling once for every calculation, eliminating some FFI overhead, would increase the speed further.

$ python2.7 -m tests.test_distance
Time elapsed for 100000 calculations in
    Python: 1952.70
    C: 300.46
    Factor: 6.50

Wrapping the C and calling it was simple enough using ctypes, and I've added fallback to the Python implementation if the C shared library cannot be found; a small __init__.py in the package hooks up the correct version:

from .distance import distance as _py_distance
try:
    from ctypes import cdll, c_double
    dll = cdll.LoadLibrary('cDistance.so')
    dll.distance.restype = c_double
    dll.distance.argtypes = [c_double, c_double, c_double, c_double]
    distance = dll.distance
    _c_distance = dll.distance
except OSError: #Fall back to Python implementation
    distance = _py_distance

Of course, this depends on the C code being compiled into cDistance.so and that file being available for linking - and it keeps the .so hardcoded so a windows DLL wont work. I really did intend to clean it up more before making it open source, but since I've been meaning to start open sourcing some of our tools for years now and never really found the time, I thought it would be better to thow it out there, and postpone making it pretty instead. I hope someone can find some use in this, and I'll try to get it cleaned upp and packaged Real Soon Now.

Blaag created 130221 18:46

pyRest part 5: You can actually use this

(part 1, part 2, part 3, part 4)

I'm almost done with the parts to make this project PyPI ready - it can now work on your application as long as you implement the actual code to route calls to the right part of your API - this works:

python -m pyrest.launcher  --server pyrest.integration.cherry \
       --api=pyrest.examples.hg

Configuration can also be done using a config file instead of or together with command-line arguments. This means that if you can describe your API's usage using modules (representing resources) and post/put/get/delete functions in those modules, you can then pretty much just copy the line above and have a REST interface as long as you return data that the json library can understand how to serialize.

The magic is all in the new launcher.py, which reads configuration, instantiates pyrest, an appserver interface, and hooks up the specified API. Very little other changes needed to be made - none to pyrest.py, and for cherry.py the only changes was to move functionality from the 'if dunder main' to a new start() function.

Time for a goal check!

  • Create a tool that can expose a Python API in a RESTish fashion
  • The API itself must not have to know about the tool
  • It must handle HTTP errors
  • It must run on Python 3.2+
These are all done, with the new launcher module taking care of reading configuration/arguments and wiring up the correct backend server - although right now, there's only the CherryPy version available.
  • It must run on at least CherryPy and two other webapp frameworks TBD (no, not Django)
Not OK, just CherryPy implemented.
  • It must be able to encode data into JSON before returning it
Partially OK, but there's no clean way of specializing the serialization if you send data that is not handled by the standard library json module.

In addition to those goals, I'll also have to create a setup.py to make the tool installable before I can call version 1.0.0

I'd also like to take the time to give a huge thanks to the people over at DZone - not only did they ask if they could repost my blogs (which they don't have to since I allow anyone to do anything with it as long as they don't blame me for damage done...), but they actually sent me goodies all the way over the Atlantic! They actually sent me nerf guns (and a t-shirt and other stuff) halfway across the globe just because they like my blog. Crazy. If everyone who likes this blog did that I'd probably have, like, three nerf guns!

That being said, code is as always available at Bitbucket.

Blaag created 130211 22:30

pyRest part 4: Separating the parts

(part 1, part 2, part 3)

I've now split the code into separate parts - pyrest.py now only has generic functionality for hooking and routing, along with a bunch of helpers to create responses with HTTP response types. In fact, it's only 35 lines of code, and that's the entire 'core' of pyRest so far.

The CherryPy integration has moved to the pyrest.integration package as cherry.py - it's still pretty clumsy to use (python -m pyrest.integration.cherry can _only_ hook up the hgapi example code), and the hgapi implementation has been moved into the pyrest.examples.hg package. The CherryPy parts have not changed much beyond always expecting a Response tuple and using it to set response status and content, but the hgapi integration now sports a post function, allowing me to commit the code just written using itself! changeset.py also uses the new Response helpers to create the return values.

The code was committed using a POST request to /api/changeset?message=Comitting via pyrest. Next, I'm planning to make the Mercurial integration useful, and/or the CherryPy integration a bit more robust.

Code is as always available at Bitbucket.

Blaag created 121220 12:44

pyRest part 3: Routing and responsibilities

In part 2, I hooked up the API to CherryPy in a very crude fashion, and this time we'll look at how we can add handlers for resources in a less clumsy way. I decided to keep handlers on one 'level' only - that is, /sketch/parrot and /sketch will both be handled by the /sketch handler. This is because I find that the same sub-resource often is present in several places (what about /props/parrot?) and having handlers like this simplifies stuff and makes the magic more readable.

That magic looks like this - it is passed a package, find all modules that has at least one of get/post/put/delete implemented, and stores them in a name->module dict.

def get_handlers(package):
    handlers = {}
    for member_name, member in
        [module for module in inspect.getmembers(package)
                if inspect.ismodule(module[1])]:
        if [fn for name, fn in inspect.getmembers(member)
               if name in ('get', 'post', 'put', 'delete')]:
            print("Adding handler %s" % member_name)
            handlers[member_name]  = member
    return handlers

Later, when we get a request, we interpret the first part of the path as resource name (although I mounted it at /api, so it becomed /api/<resource>), and then use that string to get the correct module, check for a handler for the specific method, and call it if it exists.

def requesthandler(handlers, method, resource, *pathargs, **kwargs):
    """Main dispatch for calls to PyRest; no framework specific
    code to be present after this point"""
    if not resource in handlers:
        return Response('404 Not Found', 'No such resource')
    if not  hasattr(handlers[resource], method):
        return Response('405 Method Not Allowed',
                        'Unsupported method for resource')
    return_data = getattr(handlers[resource],
                          method)(*pathargs, **kwargs)
    return Response('200 OK', json.dumps(return_data))

Right now, there's nothing exciting going on in the API, so the routing logic just calls hgapi and assumes everything will be in order:

def get(ref=None):
   rev = hgapi.Repo('.')[ref]
   return {
       'node': rev.node,
       'desc': rev.desc
   }

So, when we GET /api/changeset/1, the requesthandler will be passed this: ({'changeset': <module>}, 'get', 'changeset', ('1',)). It will lookup 'changeset' to get the module, and then retrieve and call 'get' using getattr and pass in the '1'. changeset.get() will then call hgapi, stick it into a map, and requesthandler encodes it as json and returns it. Since none of the parts involved actually cares what the arguments are, you can just as well use /api/changeset/tip or /api/changeset/default.

As it looks now, the next part _should_ probably be adding some tests, but since I'm not totally decided on how I want to write my tests, I'll push ahead with separating the code instead - the current PyRest class and everything that has to do with CherryPy should go into a pyrest.cherrypy package or something similar, the requesthandler and get_handler functions should stay as part of pyRest proper, and the backend package should probably end up in an example package.

Code is as always available at Bitbucket.

Blaag created 121214 17:24

pyRest part 2: Hooking the API

In part 1, a very unexciting base CherryPy implementation was all we had, but now it's time to hook up something real! Instead of creating a mock API to work against as example code, I've decided to use hgapi to access the pyrest repo itself as example implementation - very meta!

I've decided to hook the API in before I refactor the code to separate the web framework from pyRest, since I firmly belive in getting things working first and cleaning up after. I did create a namedtuple to hold basic response values so that the requesthandler function can be extracted later.

The 'interesting' part looks like this
Response = namedtuple('response', 'status content')
def requesthandler(method, *pathargs, **kwargs):
    """Main dispatch for calls to PyRest; no framework specific
    code to be present after this point"""
    if not method == 'get':
        return Response('500 Server Error', 'Not implemented')
    repo = hgapi.Repo('.')
    return Response('200 Ok', repo.hg_id())

...not much yet, but it responds to any GET request with the current Mercurial node id.

My intention is that the final result will be three separate parts - a routing/domain specific part that uses hgapi, pyRest proper which handles requests and autohooks up the routing, a CherryPy part which integrates with CherryPy and will need to be reimplemented for every web framework supported.

That will be at at least one update before that though, because next will be "autowiring" the routing logic. Code for the project is available at Bitbucket

Blaag created 121209 12:47

Simple REST-ful (-ish) exposure of Python APIs

After having written code to expose APIs through RESTful web services a couple of times, I've decided to do it once more, only this time I won't get paid, I won't have deadlines, I'll write it so I'll never have to write it again, and I'll make it available as open source.

Problem is, I'm a lazy, lazy person, and have not been able to muster the energy to actually get writing, which leads me to this blog post - since I've not been updating the blog as I should either, I'll kill two projects with one meeting and make the actual development process open as well, as a series of blog posts and a repository at BitBucket.

For someone else to be able to follow the work, I obviously have to nail down what the goal of this exercise is:

* Create a tool that can expose a Python API in a RESTish fashion
* The API itself must not have to know about the tool
* It must run on at least CherryPy and two other webapp frameworks TBD (no, not Django)
* It must handle HTTP errors
* It must be able to encode data into JSON before returning it
* It must run on Python 3.2+
* It must not care what the proper definition of RESTful is

In addition, some good-to-haves:

* It may make linking between resources easier (if feasible)
* It may be able to use other data formats than JSON
* It may run on Python 2.7

Because I enjoy working with CherryPy since it's very good at staying out of my way, I'll start out writing for CherryPy and then generalize from there. Just to get started, I have created a minimal CherryPy app to work from, even though I'll split the tool from the framwork (or the REST framework from the web framework?) later. The entire code looks like this

import cherrypy
def requesthandler(*pathargs, **kwargs):
    cherrypy.response.status = "500 Server Error"
    return "Not implemented"
class PyRest(object):
    def index(self, *args, **kwargs):
        return requesthandler(*args, **kwargs)
    index.exposed = True
CONF = {
    'global': {
        'server.socket_host': '0.0.0.0',
        'server.socket_port': 8888,
    }
}
if __name__ == '__main__':
    ROOT = PyRest()
    cherrypy.quickstart(ROOT, '/', CONF)
def application(environ, start_response):
  cherrypy.tree.mount(PyRest(), '/', None)
  return cherrypy.tree(environ, start_response)
Blaag created 121206 20:25

Sending non-valid names as arguments

I got a feature request on hgapi the other day, pointing out that it was not possible to filter the Mercurial log using the API, since there is no dedicated way to do it and the fallback method - sending keyword arguments that will be passed to the command line - does not work. The signature of the method in question is

def hg_log(self, identifier=None, limit=None,
           template=None, branch=None, **kwargs):

with kwargs accepting any keyword arguments and passing them to the command line. So, for getting a log by branch, trying

repo.hg_log(-b=mybranch)

seems like a good idea, until you realize that '-b' is not a valid identifier, and so this code is invalid. However, almost all Mercurial options you might want to send like this starts with a dash, so what is the point of using kwargs at all?

Notice: Entering bad practice land!

It is totally possible to send keyword arguments to a function in Python that are not valid identifiers, by using argument unpacking. Given a function like this:

>>> def myfunc(positional, kwarg='Hello', **kwargs):
>>>     print(positional)
>>>     print(kwarg)
>>>     for key in kwargs:
>>>         print("%s: %s" % (key, kwargs[key]))

you can send any dict in like this:

>>> myfunc(1, **{'-1-': 'dash-one-dash'})
1
Hello
-1-: dash-one-dash

Not a very nice way of doing things, but can be handy in - for example - a fallback case where you, want to support future arguments, obscure arguments, and generally just anything.

(hgapi 1.3.1a3 was just uploaded to the cheeseshop; hg_log now takes a 'branch' argument)
Blaag created 121102 17:29

See me on an island!

I've really neglected to update the blog lately, but that's at least in part because I've been busy doing preparations for talks - so you could see me instead of read my blog! I'll be speaking both at PyCon UK and PyCon Ireland, so if you're going to either come chat with me! And, you know, maybe see my talks...

Also, I do have a proper blog post with actual content a-brewing, and I promise it will be done Real Soon Now (tm).

Blaag created 120916 19:40

You should send a presentation proposal to PyCon

Next year, in March, I will be at PyCon. It will be the third time I attend PyCon - ever since I attended my first, not going has not really been an option.

There are lots of good things about PyCon - meeting interesting people, seeing San Francisco, beating off recruiters with a stick, hanging out in the hotel bar and chill in the evenings - but the best part is that the talks are so many, and so good. Now, the only way there'll be lots of good talks next year is if there are a lot of good proposals to choose from!

You might think it scary to propose a talk to such a large conference, but I can promise you that the Program Committee are really nice people and that unless you're a professional speaker, the feedback you get is bound to make your next proposal - and your next talk - better.

Besides, if you're employed - what better argument to make your boss send you to California than that you are going to present a talk at _the_ conference for Python?

So check the example proposal. Read the call for papers. And send your proposal before the 28th September, so we can get another awesome PyCon next year.

I've done my part, have you done yours?

Blaag created 120905 17:59

Using the AST to hack constants into Python

During EuroPython 2012, after my training and talks, I really needed to do some coding, so I started hacking on a 'practical' application of the AST - making (some) constants faster than local variables, instead of slower by inlining them on import.

To do this, I use a custom importer on meta_path to intercept the import, and find any assignments look like constants - that is, any module-level variable in ALL_CAPS that is a simple string or a number.

I store those names and values, and then simply replace any attempt to load the name with the stored value instead. (Yes - this can go wrong in many horrible ways!)

For those who don't know the AST - Abstract Syntax Tree - is a tree representation of the syntactic structure of your program. Python lets you look at and modify the AST before compiling it, which in effect allows you to rewrite the very structure of a program long before it runs. For a good introduction to the AST I heartily recommend What would you do with an AST by Matthew Desmarais.

To do this, I first need to intercept the import - the importer itself is not very interesting, but what it does is that it tries to find the source file for any imported module (if, for example, a .pyc file is found, it simply strips the 'c' and tries to load the file.

With the source file found and read, the importer just calls the transformer, compiles the result, and sticks it into sys.modules:

module = types.ModuleType(name)
inlined = transform(src)
code = compile(inlined, filename, 'exec')
sys.modules[name] = module
exec(code,  module.__dict__)

The transform method parses the source, creates the NodeTransformer that will modify the AST, and passes the parsed AST to it.

def transform(src):
    """Transforms the given source and return the AST"""
    tree = ast.parse(src)
    cm = ConstantMaker()
    newtree = cm.visit(tree)
    return newtree

Our NodeTransformer is equally simple, and overloads visit_Module (to find the constants), and visit_Name (to replace uses of the names with the value). visit_Module starts with building a list of all assignments in the module body, and then filters out assignments that fulfill our criteria for constants: they should be numbers or strings, and they should be named in ALL_CAPS. Any such assignments are stored in a name->value map that can then be used by visit_Name.

def visit_Module(self, node):
    """Find eglible variables to be inlined and store
    the Name->value mapping in self._constants for later use"""
    assigns = [x for x in node.body if
               type(x) == _ast.Assign]
    for assign in assigns:
        if type(assign.value) in (_ast.Num, _ast.Str):
            for name in assign.targets:
                if RE_CONSTANT.match(name.id):
                    self._constants[name.id] = assign.value
    return self.generic_visit(node)

The parsing of assignments must be done before the call to generic_visit, or we'll not have the mapping until after the rest of the module has already been visited. The mapping makes the work of visit_Name extremely simple:

def visit_Name(self, node):
    """If node.id is in self._constants, replace the
    loading of the node with the actual value"""
    return self._constants.get(node.id, node)

And that's all we need to do! A simple (simplistic?) benchmark shows that it works as expected for simple cases - given the following source that mixes constant access with some other 'work':

ONE = 1
TWO = "two"
THREE = 3
FOUR = 4.0
def costly(iterations):
   tempstr = ""
   obj = MyClass()
   for i in range(iterations):
     tempstr += ONE * TWO * THREE + str(i)
     obj.change()
   tempstr += str(obj.value)
    eturn tempstr
class MyClass(object):
   def __init__(self):
     self.value = random.random()*THREE
   def change(self):
     self.value += random.random()*FOUR

...a transformed version runs 15-20% faster than the untransformed version. Of course, my first benchmark which did only loading of constants was a bazillion times faster, but also not very interesting.

This is of course a very limited implementation - a 'proper' implementation would have to prevent writing to constants (right now writes will be silently ignored by code in the current module), in-module writes to a constant should be detected, the transform should fallback to return the untransformed tree if it fails, and maybe, just maybe, it's just not a very good idea at all.

It was, however, great fun to write! The code is available at the blog repository - the timer I use for the benchmarks is written by @BaltoRouberol

Next experiment will be inlining of functions, I think, or maybe lazy evaluation of function parameters.

Blaag created 120726 18:57

EuroPython impressions

I had a blast at Europython - I made new friends, went to a couple of talks that gave me some ideas for the future, and my own talks seemed to go down well.

All in all, the EP2012 organization was great, the food was way beyond 'normal' conference fare, and the venue was good - although finding a place to sit down during lunch was hard.

I would have liked some kind of feedback mechanism in place for the talks - had I known there would be none I'd have arranged one for my own talks and especially the training session I held. I also feel that the conference was a bit too long; perhaps it was not necessary to give me the opportunity to take the stage thrice? Overall, I felt that the quality of talks did not hold up for five days, and would have preferred a shorter but more consistently high quality program (even if some or all of my own sessions had been cut). By the end of the week, I was worn out and going to talks I'd been excited about before the conference almost became a chore. In addition, I attended no BoF:s, and I'm not sure if there were any; either there needs to be BoF:s or the information about them needs to be obvious enough that even I get it.

However, as I started out saying, EP2012 was a great conference overall, and I will definitely go again next year, and recommend anyone else to do so as well. There were some amazing talks, great discussions, and @PythonItalia did a wonderful job organizing it all. I really felt that the volunteers were there to help me and they made me feel welcome both as a speaker and attendee.

If you did attend any of my talks or my training session, and have any comments at all, please comment on this post, ping me at @fhaard, and/or mail me at fredrik (at) haard (dot) se.

Also: Kernel failure during presentation does not have to be end of the world, but seriously? The one time I dont have index cards?

Blaag created 120714 15:02

You deserve practice

I enjoyed The Clean Coder by Uncle Bob, and would recommend it to any serious developer. I agree with almost everything in it, but there is one jarring exception - I disagree strongly with his view that because it's your own responsibility to practice, you should not do it on paid time.

Under the headline "Practice Ethics", he states: "Professional programmers practice on their own time. It is not your employer’s job to help you keep your skills sharp for you [...] Football fans do not (usually) pay to see players run through tires. Concert-goers do not pay to hear musicians play scales. And employers of programmers don’t have to pay you for your practice time."

But the best football clubs do pay players for training time, and the best orchestras pay their musicians to practice. I agree with one thing, and that is that an employer doesn't have to pay you for practice time, but an employer that does not want to pay for programmer training time does not deserve the best programmers.

I believe, that if you have ambitions of being a great programmer you should find an employer that lets you become great. If you want to be among the best, you should find an employer that understands that hiring and keeping the best programmers means that they will need to spend time honing their skills. This is by no means black and white, but there are not that many great programmers out there, and it's a sellers market. The first step in taking responsibility for your own career is to make sure that you have an employer that deserves you.

Besides, a common trait of almost all great - or, indeed, professional - programmers I’ve met is that programming is not their only passion; and most of them work enough that doing a significant amount of training outside of work would kill their ability to pursue other interests in a meaningful way. I know that my own quality of work degrades rapidly when the workload is high enough that I have to cut down on other interests, simply because I need real down time to solve hard problems, and anything related to programming is not down time.

Blaag created 120423 08:42

Version 1.2.0 of hgapi released

hgapi is a pure-Python API for Mercurial, which uses the command-line interface for maximum compatibility. It is tested for Python 2.7 and 3.2.

Version 1.2.0 fixes a few bugs, and allows iterating over a repository as well as using slices ( i.e. repo[0:'tip']) to get a set of changesets. API documentation is also slightly improved.

Special thanks to Nick Jennings and Adam Gomaa who contributed patches.

Blaag created 120413 12:40

Protocol specifications written in Python

This is a writeup of a talk I did recently at Software Passion Summit in Gothenburg, Sweden. For more background info, see the post I did prior to the conference.

Writing a specification in a full-blown programming language like Python has upsides and downsides. On the downside, Python is not designed as a declarative language, so any attempt to make it declarative (apart from just listing native data types) will require some kind of customization and/or tooling to work. On the upside, having a declaration in the language you write your servers in, you can use the specification itself, rather than a generated derivative of that specification, and writing custom - in this case minimal - generators for other languages is simple, since you can you Python introspection to traverse your specification, and the templating logic of your choice to generate source - this makes it possible, for example, to target a J2ME terminal that just won't accept existing solutions, and where dropping a 150K jar file for protocol implementation is not an alternative.

For me, this journey started around 2006 when I started to lose control over protocol documentation and protocol versions for the protocol used between terminals and servers in the fleet management solution Visual Units Logistics. After looking for, and discarding, several existing tools, and after being inspired by the fact that we usually configure Javascript in Javascript, I started to sketch (as in, ink on paper) on what a protocol specification in Python would look like. This is a transcription of what I came up with at the time:

imei = long
log_message = string
timestamp = long
voltage = float
log = Message(imei, timestamp,
           log_message, voltage)
protocol = Protocol(log, ...)
protocol.parse(data)

With this as a target, I created the first version of a protocol implementation. It looked similar to the target version, but suffered from an abundance of repetition:

#protocol.py
LOG = 0x023
ALIVE = 0x021
message = Token('message', 'String', 'X')
timestamp = Token('timestamp', 'long', 'q')
signal = Token('signal', 'short', 'h')
voltage = Token('voltage', 'short', 'h')
msg_log = Message('LOG', LOG, timestamp, signal, voltage)
msg_alive = Message('ALIVE', ALIVE, timestamp)
protocol = Protocol(version=1.0, messages=[msg_log,msg_alive])
#usage
from protocol import protocol
parsed_data = protocol.parse(data)
open('Protocol.java’,'w').write(protocol.java_protocol())

The implementation around this is simple; the Token class knows how to parse a part of a message, the Message class knows which Tokens to use (and in which order), and the Protocol class selects the correct Message instance using a mapping of marker bytes to Message instances.

However, no support is given for handling multiple versions of the protocol, and the amount of name duplication makes it really cumbersome - so I set out to create a better version.

Some things complicated the creation of a better version. The worst problem of them all proved to be me, myself and I. At this time I had used Python for a couple of years, and started to get interested in the more sophisticated tools available. I had just taught myself about metaclasses, and thought they were an ingenious application of object orientation - and having found a shiny new hammer, I was itching to find a nail.

Unfortunately, I had no pressing need for using metaclasses, so I invented one - I wanted to avoid some assignments in the protocol specification, so I used metaclasses to rip out the init (constructor) method and replace it with a version that registered the instance in a global map and then called the original init method. This is wrong in at least three ways - since it was not generic, it could have been done in the init method directly, if it would have been general it would have been a job for a decorator, and it is a really great way to obfuscate the code:

__MSG_MAPPING__ = {}
def msg_initizer(cls, old_init):
    def new_init(self, name, marker, *args):
        __MSG_MAPPING__get(cls, {})[name] = self
        __MSG_MAPPING__[cls][struct.pack("!B", marker)] = self
        old_init(self, name, marker, *args)
    return new_init
class RegisterMeta(type):
    def __new__(cls, name, bases, attrs):
        attrs['__init__'] = msg_initizer(cls,
                                         attrs['__init__'])
        return super(RegisterMeta, cls).__new__(cls,
                                      name, bases, attrs)
class Message(object):
    __metaclass__ = RegisterMeta

This is the kind of code I'm not proud of, by the way. The worst part? It didn't even remove the duplication, although it lowered it somewhat - and the global registration of messages when loading a protocol really messed up any attempt of multiple version support. This was not the only problem; I also went overboard and wanted to support specifying protocol syntax, using a Flow class that defined legal ordering of messages. This might have been a good idea had we actually had any such requirements in our protocols; since they are “authenticate, do anything”, adding support for this just expanded the codebase and made the protocol specification more complex for extremely little gain (especially since we authenticate in different ways depending on the client). Adding insult to injury, this is even more verbose than the very first try.

#In protocol.py
imei = Token('imei', 'long')
message = Token('message', 'String')
timestamp = Token('timestamp', 'long')
signal = Token('signal', 'short')
voltage = Token('voltage', 'short')
auth = Token('auth', 'String')
Markers({'LOG': 0x023,
    'ALIVE': 0x021,
    'AUTH': 0x028})
Message('LOG', imei, timestamp, signal, voltage)
Message('ALIVE', imei, timestamp)
Message('AUTH', imei, timestamp, auth)
Flow([('AUTH'), ('LOG', 'ALIVE')])
#Usage
protocol = Protocol(version=2.0)
parsed_data = protocol.parse(data) #error if not auth parsed

This entire attempt became a warning example - it shows the danger of finding new and interesting technology and applying it before grokking it, and it shows the danger of over engineering and feature creep. Luckily, once I got a good look on what I had created, even me-a-few-years-back could see that this was an abomination, which was subsequently quietly taken out back and put down without even making it as far as integration tests.

Finally, and ongoing, I decided to apply a carefully measured amount of standard library magic to make the specifications more terse, and remove stuff that we did not need. This made the specification look something like this instead:

#In protocol_4.2.py:
#Tokens
t('message', string)
t('timestamp',  i64)
t('signal', i16)
t('voltage', i16)
#Messages
LOG = ('A log message containing some debug info',
         0x023, timestamp, message, signal, voltage)
ALIVE = ('A message to signal that the terminal alive',
     0x021, timestamp)
#Usage
protocols = load_protocols('protocols')
parsed = protocols[4.2].parse(data)
protocols[4.2].write_java() #Writes to Protocol42.java

At one time, it was even terser (as in the earlier blog post), but that version didn’t really pan out, and the version in production is very similar to this one. Name duplication is avoided using two different techniques - the tokens are defined by calling a method t that creates the Token instance and injects it back into the calling namespace using the supplied name:

#In types.py
from inspect import currentframe
def t(name, data_type):
    """Inserts name = (name, data_type) in locals()
    of calling scope"""
    currentframe().f_back.f_locals[name] = (name, data_type)

To some, this may seem like blasphemy, but consider this - the implementation is extremely simple in concept, it gets the work done, and it is easy to explain. Another change is that the messages are created solely by using inspect to extract members of the module that look like messages - name in all caps, and a tuple. Worth noting might be that there was error handling initially, but I removed that to make parsing fail, rather than accept a specification that may or may not have contained errors.

Finally, java source and html documentation is created by traversing the protocol instance, and feeding the information into simple templates - experiments were made using literate programming using ReST to create documentation, but in the end that tended to obfuscate rather than the reverse. This may be an effect of naive implementation, or that the problem does not lend itself well to literate programming, but either way it was not worth it in this case.

There is a working and slightly generalized version available at bitbucket, and it you would like to hear more about this (and more details about Python magic used), you can buy a ticket to EuroPython - you’ll have until Sunday to vote for my proposals (and others).

Blaag created 120329 10:22

PyCon 2012 - the other stuff

I have tried to do a full writeup of my PyCon experience this year, and failed miserably, so this is what I’ll do: This post will focus only on the conference experience - lessons learned, sessions attended, and projects discovered will have their own posts; this is the other stuff.

So what about the conference as a whole? It was, just like Atlanta last year, an overwhelmingly positive experience. This was the first time I volunteered, and I really felt that that was a given win - from getting to have a say in the program by joining the program committee, through doing a session as a session runner and getting to see all the work that goes on behind the scenes, to responding to a just-in-time tweet to join the swag-packing party. Just standing somewhere and looking confused would prompt someone more experienced to help you out, and people were just so genuinely nice. Will definitely do again.

The venue was good, although as others have already remarked, the open spaces were too far away from the main rooms - I believe this made both the BoFs and the hallway track a bit less exciting than last year.

Food was acceptable, and lunch was served in a timely fashion - breakfast was awesome the first day, and good the following days.

The swag was good, apart from the orange bottle opener - you know who you are. Also, so many t-shirts!

(I really need to fix proper tags for posts so that I don't have to hack a post on PyCon into the Python rss feed used by Planet Python...))

Blaag created 120322 21:30

[rant] Dare to show your code

My name is Fredrik, and sometimes I write code I’m not that proud of.

A friend of mine started on a Python project recently, and when I asked him to put it up on Bitbucket his response was an immediate and not-quite-mock “But then people will see my code!”. I believe this fear of showing one’s code is common, and I believe that it is a problem. Not so much for open source, or anything like that, but for the individual - it suggests a belief that your code isn’t good enough, that other people’s code is better, and/or that offering my code up for others to see will lead to rejection and ridicule. I know I was wary before suggesting a patch to Python, because I feared it was not good enough (it was, but the tests weren’t - nobody was mean in telling me they needed work to conform). I had over ten repositories at Bitbucket before I open sourced the first one, and I spent too much time worrying over the autohook source before daring to make it public... for no good reason at all.

Sometimes I’m not proud of my code; I was in a hurry, I was new to the tools or the domain, I was lazy, I did not know better at the time, there were customer demands I could not fulfill in any other way, or any other reason or excuse. Sometimes somebody tells me I should clean up my code - and that is good. Having others critique your code is one of the best ways of getting better, and knowing that others will look at your code will make you (at least it makes me) write better code. This is why code review is such a powerful tool - even if the reviews seldom find serious errors, people tend to write better code when they know somebody will read it right now, as opposed to that someone will be forced to read it when maintaining the code base some time in the future.

Added to this is, as much as we feel bad when showing our own code, so does everybody else, from time to time. Not all code will be perfect - I’d argue that if we spent the time to make all code perfect, we’d never get anything done. Besides, not everyone will agree on what perfect code entails, so it’s a fools errand - someone will always think there are things about your code that are imperfect. And that’s OK. Showing off my code online (some of it really ugly, like the Blaag source code, has netted me all of two immature flames, but also some clever insights, pull requests with bug fixes, and in one case even someone to discuss the code with (thanks Martijn!).

Blaag created 120302 14:01

Python Closures and Decorators (Pt. 2)

Edit: got complaints that code was hard to read, trying out Pygments.

In part 1, we looked at sending functions as arguments to other functions, at nesting functinons, and finally we wrapped a function in another function. We'll begin this part by giving an example implementation on the exercise I gave in part 1:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs):
...     print("Calling %s with arguments: \n\targs: %s\n\tkwargs:%s" % (
...            fn.__name__, args, kwargs))
...     retval = fn(*args, **kwargs)
...     print("%s returning '%s'" % (fn.func_name, retval))
...     return retval
...   fn_wrap.func_name = fn.func_name
...   return fn_wrap
...
>>> def greeter(greeting, what='world'):
...     return "%s %s!" % (greeting, what)
...
>>> greeter = print_call(greeter)
>>> greeter("Hi")
Calling greeter with arguments:
     args: ('Hi',)
     kwargs:{}
greeter returning 'Hi world!'
'Hi world!'
>>> greeter("Hi", what="Python")
Calling greeter with arguments:
     args: ('Hi',)
     kwargs:{'what': 'Python'}
greeter returning 'Hi Python!'
'Hi Python!'
>>>

So, this is at least mildly useful, but it'll get better! You may or may not have heard of closures, and you may have heard any of a large number of defenitions of what a closure is - I won't go into nitpicking, but just say that a closure is a block of code (for example a function) that captures (or closes over) non-local (free) variables. If this is all gibberish to you, you're probably in need of a CS refresher, but fear not - I'll show by example, and the concept is easy enough to understand: a function can reference variables that are defined in the function's enclosing scope.

For example, take a look at this code:

>>> a = 0
>>> def get_a():
...   return a
...
>>> get_a()
0
>>> a = 3
>>> get_a()
3

As you can see, the function get_a can get the value of a, and will be able to read the updated value. However, there is a limitation - a captured variable cannot be written to:

>>> def set_a(val):
...   a = val
...
>>> set_a(4)
>>> a
3

What happened here? Since a closure cannot write to any captured variables, a = val actually writes to a local variable a that shadows the module-level a that we wanted to write to. To get around this limitation (which may or may not be a good idea), we can use a container type:

>>> class A(object): pass
...
>>> a = A()
>>> a.value = 1
>>> def set_a(val):
...   a.value = val
...
>>> a.value
1
>>> set_a(5)
>>> a.value
5

So, with the knowledge that a function captures variables from it's enclosing scope, we're finally approaching something interesting, and we'll start by implementing a partial. A partial is an instance of a function where you have already filled in some or all of the arguments; let's say, for example that you have a session with username and password stored, and a function that queries some backend layer which takes different arguments but always require credentials. Instead of passing the credentials manually every time, we can use a partial to pre-fill those values:

>>> #Our 'backend' function
... def get_stuff(user, pw, stuff_id):
...   """Here we would presumably fetch data using the supplied
...   credentials and id"""
...   print("get_stuff called with user: %s, pw: %s, stuff_id: %s" % (
...         user, pw, stuff_id))
>>> def partial(fn, *args, **kwargs):
...   def fn_part(*fn_args, **fn_kwargs):
...     kwargs.update(fn_kwargs)
...     return fn(*args + fn_args, **kwargs)
...   return fn_part
...
>>> my_stuff = partial(get_stuff, 'myuser', 'mypwd')
>>> my_stuff(3)
get_stuff called with user: myuser, pw: mypwd, stuff_id: 3
>>> my_stuff(67)
get_stuff called with user: myuser, pw: mypwd, stuff_id: 67

Partials can be used in numerous places to remove code duplication where a function is called in different places with the same, or almost the same, arguments. Of course, you don't have to implement it yourself; just do from functools import partial.

Finally, we'll take a look at function decorators (there may be a post on class decorators in the future). A function decorator is (can be implemented as) a function that takes a function as parameter and returns a new function. Sounds familiar? It should, because we've already implemented a working decorator: our print_call function is ready to be used as-is:

>>> @print_call
... def will_be_logged(arg):
...   return arg*5
...
>>> will_be_logged("!")
Calling will_be_logged with arguments:
     args: ('!',)
     kwargs:{}
will_be_logged returning '!!!!!'
'!!!!!'

Using the @-notation is simply a convenient shorthand to doing:

>>> def will_be_logged(arg):
...   return arg*5
...
>>> will_be_logged = print_call(will_be_logged)

But what if we want to be able to parameterize the decorator? In this case, the function used as a decorator will received the arguments, and will be expected to return a function that wraps the decorated function:

>>> def require(role):
...   def wrapper(fn):
...     def new_fn(*args, **kwargs):
...       if not role in kwargs.get('roles', []):
...         print("%s not in %s" % (role, kwargs.get('roles', [])))
...         raise Exception("Unauthorized")
...       return fn(*args, **kwargs)
...     return new_fn
...   return wrapper
...
>>> @require('admin')
... def get_users(**kwargs):
...   return ('Alice', 'Bob')
...
>>> get_users()
admin not in []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in new_fn
Exception: Unauthorized
>>> get_users(roles=['user', 'editor'])
admin not in ['user', 'editor']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in new_fn
Exception: Unauthorized
>>> get_users(roles=['user', 'admin'])
('Alice', 'Bob')

...and there you have it. You are now ready to write decorators, and perhaps use them to write aspect-oriented Python; adding @cache, @trace, @throttle are all trivial (and before you add @cache, do check functools once more if you're using Python 3!).

Blaag created 120301 19:04

Python Closures and Decorators (Pt. 1)

Since I, in retrospect, made the wrong choice when cutting down a Python course to four hours and messed up the decorator exercise, I promised the attendants that I'd make a post about closures and decorators and explain it better - this is my attempt to do so.

Functions are objects, too. In fact, in Python they are First Class Objects - that is, they can be handled like any other object with no special restrictions. This gives us some interesting options, and I'll try to move through them from the bottom up.

A very basic case of using the fact that functions are objects is to use them as you would a function pointer in C; pass it into another function that will use it. To illustrate this, we'll take a look at the implementation of a repeat function - that is, a function that accepts another function as argument together with a number, and then calls the passed function the specified number of times:

>>> #A very simple function
>>> def greeter():
...   print("Hello")
...
>>> #An implementation of a repeat function
>>> def repeat(fn, times):
...   for i in range(times):
...     fn()
...
>>> repeat(greeter, 3)
Hello
Hello
Hello
>>>

This pattern is used in a large number of ways - passing a comparison function to a sorting algorithm, passing a decoder function to a parser, and in general specializing the behaviour of a function, or passing a specific parts of a job to be done into a function that abstracts the work flow (i.e. sort() knows how to sort lists, compare() knows how to compare elements).

Functions can also be declared in the body of another function, which gives us another important tool. In the most basic case, this can be used to "hide" utility functions in the scope of the function that uses them:

>>> def print_integers(values):
...   def is_integer(value):
...     try:
...       return value == int(value)
...     except:
...       return False
...   for v in values:
...     if is_integer(v):
...       print(v)
...
>>> print_integers([1,2,3,"4", "parrot", 3.14])
1
2
3

This may be useful, but is hardly in itself a very powerful tool. Compared with the fact that functions can be passed as arguments however, we can add behaviours to function after they are constructed, by wrapping them in another function. A simple example would be to add a trace output to a function:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs): #take any arguments
...     print("Calling %s" % (fn.func_name))
...     return fn(*args, **kwargs) #pass any arguments to fn()
...   return fn_wrap
...
>>> greeter = print_call(greeter) #wrap greeter
>>> repeat(greeter, 3)
Calling fn_wrap
Hello
Calling fn_wrap
Hello
Calling fn_wrap
Hello
>>>
>>> greeter.func_name
'fn_wrap'

As you can see, we can replace the greeter function with a new function that uses print to log the call, and then calls the original function. As seen on the last two rows of the example, the function name of the function reflects that it has been replaced, which may or may not be what we wanted. If we want to wrap a function while keeping the original name, we can do so by adding a row to our print_call function:

>>> def print_call(fn):
...   def fn_wrap(*args, **kwargs): #take any arguments
...     print("Calling %s" % (fn.func_name))
...     return fn(*args, **kwargs) #pass any arguments to fn()
...   fn_wrap.func_name = fn.func_name #Copy the original name
...   return fn_wrap

Since this is rapidly turning into a very long post, I'll stop here and return tomorrow with part two, where we'll look at closures, partials, and (finally) decorators.

Until then, if this is all new to you, use print_call as a base to create a function that will print function name and arguments passed before calling the wrapped function, and the function name and return value before returning.

Update: part 2

Blaag created 120229 20:36

Blaag using Genshi, greedy bloggers adding ads, and demographics. Also ranting.

(There will be very little techical content here; if that's what you are looking for, move along)

So, I updated Blaag to use Genshi for templating (still in it's own branch), which was a pretty pleasant experience; I cleared out some of the worst code from blaag.py and ended up with a single html template instead of a host of snippets.

I also added a left column to the template (should probably make it optional) so that only blog links live in the right column. This made me go over the 960px with that I used to have, but a quick look at my visitor statistics claim that most of the visitors won't be adversly affected by that - I still intend to add a lite/mobile alternate style, since I do want anyone to be able to read the blog.

On the topic of visitors, you are a curious bunch, but perhaps what could be expected of readers of a tech blog: 50% use Chrome, 21% Firefox, 12% Safari, 6% Android browser, 6% 'Mozilla Compatible', 2% Opera, 1.5% IE, and then a gaggle of uncommon browsers. 31% use Windows, 30% OSX, 18% Linux of some sort, and the rest are mainly mobile OSes. About 10% of users block Google Analytics, accoring to the server logs.

Also on the topic of visitors, I thought that hey, I have a lot of visiors, I should be able to get some cash out of the hours spent and maybe make hosting break-even (rather, move haard.se to a host of it's own)! I figure that any readers that take offense at ads are in the 10% that block Analytics, and will block the ads, so nothing is stopping me.

Well, I tried to read up a bit on ad programs and ads on tech blogs, but there's not a whole lot of signal in all the noise that turns up; mainly it advertising for ad programs, and 'this works/does not work, because I say so'. Figures, or even mildly solid-looking arguments are missing. Right now I tossed a AdSense add there just because it was quick and I could force a look and feel; I suspect it will attract about zero clicks. I could allow a banner, but AFAIK I cannot then preemtively block ugly, animated or just wrong ads. In addition, it seems that the AdSense ads are synchronous, slowing down the site to show ads which is wrong in itself.; nobody visits the site to view ads, after all.

So, I'll probably have to look for another program (or just not have ads), but I have no idea what that would be, since everyone is so damn secret about their figures. Right now belive I have about 50K visitors a month who does not block javascript in general or Google in particular (50K since 8th of february, when the blog went live for 'real'), but I have no idea whatsovever where that puts me in the great scheme of things since people don't seem to talk about their visitor numbers, and I have no idea where I should look for ads that a) are at least somewhat interesting to developers b) does not slow down page loading c) Pays enough to make it worth the overhead of maintaining them for a non-webmaster.

Meh.

Blaag created 120227 17:53

Your favourite programming language is not good enough

I was blown away by the amount of response - mostly positive - on my Python is important post. However, a lot of the replies, both positive and... slightly less positive, really highlighted an issue I have with how a lot developers seem to approach programming languages: the search for the Perfect Language to Love and Protect. Why are so many developers so very emotional when it comes to their favourite programming language? Considering that no language can (yet) magically translate the perfect idea in your head into machine code, all of them exist on a scale of badness - they all limit you more than your own thoughts or the hardware does.

I believe that the primary reason people feel the need to vehemently defend a particular language is that they are lazy. Of course, good programmers are always lazy (why else automate?), but this is a specific and very bad laziness - being too lazy to learn. If my favourite language is better than anything else, or maybe at least just as good as anything else, I don't have to spend time and effort learning new languages.

The main problem with this is not only that you won't find the perfect language, but that when you're only comfortable in one or two languages, the way you solve problems become limited by whats possible to do in those languages - and if the languages you know are similar and from the same paradigm, the problem gets worse.

When you choose a language to solve a problem, by all means, use the language you feel you will solve it best in - the more powerful, more productive, most comfortable, the one with the most libraries... but if you want to be a serious programmer or developer, rather than someone who dabbles a bit in programming, you need to learn new languages, and you need to stop believing that you found that one language that is better than the rest. All programming languages have made trade-offs, and none is perfect. I would argue that some languages are better than others, but no language is the best at everything, and no language got everything right. Python has it's own problems (its not that it is dynamically typed), so has the different Lisp dialects (its not that it has too many parenthesis), and so has Haskell (its the indisputable fact that it is weird*).

Learn new languages. Learn not to be partisan and defend 'your' language against any criticism. If you haven't already, read Structure and Interpretation of Computer Programs, and learn some form of Lisp - it will make you see and feel the limitations of other languages, and the pain will make you a better programmer, whatever language you use.

* No, I'm not being serious. Haskell is next on my list of languages to learn.

Blaag created 120224 16:30

Python API to git: gitapi

Train rides can be good - if not creativity, then at least boredom-induced productivity. I had planned to make a hgapi fork that worked against git instead of Mercurial, and during the ride back from holding a Python workshop in Malmö (a three-hour trip back to Karlskrona; I still have another hour to go...), I finally did. gitapi is born, and even though its in it's infancy, it supports a large number of common operations. Fair warning - I basically just swiched hg->git and then fixed the test cases that still made sense, so there's bound to be some kinks left to iron out. Either way, the test suite passes on both Python 2.7 and 3.2, so there's something.

Blaag created 120221 21:24

Explaining comprehensions to programmers

For the first year or two programming Python, I never used list comprehensions (at the time, those were the only comprehensions). I read about them, I kinda figured out how they worked, and then I stuck to map() and filter(), which I understood. Looking back, I think that this has a lot to do with the fact that explanations of comprehensions are done using their origin - mathematics - rather than the domain we use them in - programming.

A quick duckduckgo search tells me this is still the case - Wikipedia asks us to consider something like this: S = {2⋅x|xϵN|x > 3} , and other sources also seem to start out with ‘this is how its done in math, so...’ (a notable exception is the tutorial on python.org).

When talking to programmers, I’d like to explain comprehensions differently, because not all programmers have a background in mathematics. For a programmer, a list comprehension is simply a for loops for constructing lists, using a more declarative notation than your usual for loop. For those of us used to map() and filter(), list comprehensions are both of those as well.

Consider:

def loop(my_list):
  result = []
  for x in my_list:
      if x > 3:
          result.append(x*2)
  return result

Ever written code like this? This is code that explicitly states which steps should be taken to construct your list; but you don’t have to - you can instead state what you want:

def compr(my_list):
  return [x*2 for x in my_list if x > 3]

This translates to give me value*2 for every value in my_list, but only if that value is more than three. Note also that this expression does the work of both map (multiply by two) and filter (take only values that are less than two). The general case would be [add something to the list for each value in an iterable, optionally only if a condition is True for that value]

Comprehensions also work nested - consider this simple but ugly code:

def create_matrix_loop(size, default):
  new_matrix = []
  for y in range(size):
    row = []
    for x in range(size):
      row.append(default)
    new_matrix.append(row)
return new_matrix

Sample output:

>create_matrix_loop(3, None)
[[None, None, None],
 [None, None, None],
 [None, None, None]]

Since comprehensions can be nested, this can be replaced with:

def create_matrix_compr(size, default):
  return [[default for x in range(size)] for x in range(size)]

As an added bonus, when we don’t tell the compiler how we want to do something, but rather what we want done, it can generate better - faster - bytecode for us. The loop version of create_matrix is translated into 35 bytecode instructions, and the version using a list comprehension is only 20 (try import dis; dis.dis(func) to see what func looks like in bytecode) and in reality, you will often avoid making a function at all when using comprehensions since they’re terse enough on their own, making this difference even bigger. Timing the implementations, the difference is evident:

>timeit -n100 create_matrix_loop(1000, None)
100 loops, best of 3: 113 ms per loop
>timeit -n100 create_matrix_compr(1000, None)
100 loops, best of 3: 49.1 ms per loop

That's right: less code, declarative syntax, and faster execution! (Note: I used iPython when creating and timing the examples - it's awesome and you should try it)

Blaag created 120216 11:10

Why Python is important for you

I believe that Python is important for software development. While there are more powerful languages (e.g. Lisp), faster languages (e.g. C), more used languages (e.g. Java), and weirder languages (e.g. Haskell), Python gets a lot of different things right, and right in a combination that no other language I know of has done so far.

It recognises that you’ll spend a lot more time reading code than writing it, and focuses on guiding developers to write readable code. It’s possible to write obfuscated code in Python, but the easiest way to write the code (assuming you know Python) is almost always a way that is reasonable terse, and more importantly: code that clearly signals intent. If you know Python, you can work with almost any Python with little effort. Even libraries that add “magic” functionality can be written in perfectly readable Python (compare this to understanding the implementation of a framework such as Spring in Java).

Python also acknowledges that speed of development is important. Readable and terse code is part of this, and so is access to powerful constructs that avoid tedious repetition of code. Maintainability also ties into this - LoC may be a all but useless metric, but it does say something about how much code you have to scan, read and/or understand to troubleshoot problems or tweak behaviours.

This speed of development, the ease with which a programmer of other languages can pick up basic Python skills, and the huge standard library is key to another area where Python excels - toolmaking. Any project of size will have tasks to automate, and automating them in Python is in my experience orders of magnitude faster than using more mainstream languages - in fact, that was how I started out with Python, creating a tool to automate configuring Rational Purify for a project where it before was such a chore that it was never run (and memory leaks were not fixed). I’ve since created tools to extract information from ticket systems and presenting them in a way useful to the team, tools to check poms in a Maven project, Trac integration, custom monitoring tools... and a whole lot more. All of those tools have been quick to implement, saved a lot of time, and several of them has later been patched and updated by people with no Python background - without breaking.

That building custom tools is easy hints at another strength - building and maintaining custom software is easy, period. This is why, while the quite huge Django framework might be the most famous Python web framework, there is also a host of successful small and micro-frameworks. When working in a powerful programming language with a wide array of standard and third-party libraries, you often don’t need to accept the trade-offs that are necessary when using any large off-the-shelf framework. This means that you can build exactly the software your customers want, rather than telling them that ”this is how it’s done, sorry”. To me, this is a huge difference. I feel ashamed when I have to tell a customer that no, sorry, this seems like a simple requirement, but the framework we use makes it impossible or prohibitively expensive to implement. Whenever this happens, you have failed. Writing software that fits into the customer’s model rather than into a framework is important, and I for one feel that a lot of developers today has lost sight of that simple fact. A lot of programmers now spend more time being configurators of frameworks and makíng excuses for their shortcomings, rather than actual programming.

Finally, if you’re a boss-wo/man or general manager, using Python has a final benefit - Python programmers run into less frustration*, which makes them happier, and even more productive!

(*may not be true when installing source-distributed C extensions on Windows)

Blaag created 120211 10:26

Using Python to get rid of .doc

I'll be appearing att Software Passion to speak about using Python for protocol specifications, instead of using an external document to write the specification, and then try to implement it from there (or, perhaps more common, implementing it and then trying to keep the document up-to-date).

A while ago at Visual Units, the situation was this: There was a protocol to transfer data over TCP from fleet management black boxes running J2ME to a server running Python, which then stored that data so interesting things could be done with it. Accompanying the protocol was a ever-slightly-out-of-date protocol specification, and a client implementation in Python used for testing the server.

This means that we had four different implementations of the protocol: one in Java, two in Python, and one in English. If one of those was not updated when the others were, the system was no longer consistent, and might break in interesting ways.

Since this created a lot of work for me, I set out to change things. First, I searched for viable existing solutions, but the need to keep the protocol compact (telematics data transfer is expensive), and J2ME support meant I did not find anything to use off the shelf.

Instead I started to implement my own solution, with a vision that I would implement the protocol once, and use it everywhere - Java, Python, and English. In the end, using a couple of hundred of rows of Python, we can now specify a protocol thus:

message = string
timestamp = i64
timediff = i32
ping = ("A ping, with a time and message",
         timestamp, message)
pong = ("A pong, with message, timestamp and perceived lag",
        timestamp, timediff, message)

...and from this, we create Java source code for the terminals, the Python clients and servers use it directly when packing and parsing messages, and the documentation for the poor souls who might want to read English instead of Python is generated.

Want to know how this was made possible, see some code, and point and laugh at my miserable attempts that failed? Want to know why meta-classes were absolutely vital - or not? Register for Software Passion where I'll be talking about this - if you use the promontion code 'BLAAG' when registering, you'll even get a 10% discount!

Blaag created 120207 18:36

Mercurial in Python 3: promoting hgapi

When I took a look at the python.org Py3k poll, I saw that Mercurial was on the top list of things people wanted ported (though far behind the likes of Django). Now, I don't know why others want to tie into Mercurial from code, but if a little performance overhead from using the CLI isn't critical - such as when writing hooks, or just integrating version control in some tool or another, you might want to consider hgapi.

hgapi uses only the command line interface, and was created to be able to release autohook under a more permissive license than the GPL - and it's tested against both Python 2.7 and Python 3.2. It now supports most (for a given computation of 'most') operations in Mercurial, and as a bonus there are no open feature requests - so if there is something you miss, this is the time to request it!

Also, don't forget to register for PyCon, early bird rates until 25/1!

Blaag created 120119 17:00

What's the point of properties in Python?

A few days ago I was asked by a collegaue what the point of properties in Python is. After all, writing properties is as much text as writing getters and setters, and they don't really add any functionality except from not having to write '()' on access.

On the surface, this argument holds as we can see by comparing a simple class implemented with getters and setters, and with properties.

Implemented with getters and setters:

>>> class GetSet(object):
...   x = 0
...   def set_x(self, x):
...     self.x = x
...   def get_x(self):
...     return self.x
...
>>> getset = GetSet()
>>> getset.set_x(3)
>>> getset.get_x()
3

And implemented with properties:

>>> class Props(object):
...   _x = 0
...   @property
...   def x(self):
...     return self._x
...   @x.setter
...   def x(self, x):
...     self._x = x
...
>>> props = Props()
>>> props.x = 5
>>> props.x
5

The point

In fact, we've gone from 196 to 208 chars in this simple use case - so why would we use properties at all?

The answer is, that in this use case we would not. In fact, we would write thus:

>>> class MyClass(object):
...   x = 0
...
>>> my = MyClass()
>>> my.x = 4
>>> my.x
4

'But!', I can hear you scream, 'there's no encapsulation!'. What will we do if we need to control access to x, make it read-only or do something else to it? Won't we have to refactor everything to the getters and setters that we avoided?

No - we just switch to the property version, add whatever we want, and have not changed the interface one iota! The great thing about properties is not that they replace getters and setters, its that you don't have to write them to future-proof your code. You can start out by writing the simplest implementation imaginable, and if you later need to change the implementation you can still do so without changing the interface. Neat, huh?

Blaag created 120115 13:03

hgapi 1.1.0

As a belated christmas gift, I just released hgapi version 1.1.0. New since 1.0.1 is support for hg status, merge and revert. This means that I right now have no firm plans for the future, as the tool does what I need it to do. If you have other requirements, add them to the issue tracker.

Blaag created 111229 13:21

Jenkins

Recently, I wanted to migrate some lightweight services from a virtual host to an account at Webfaction, since running a wiki/issue tracker (Trac) and CI server (Jenkins) for a couple of low-volume projects really shouldn’t take a whole machine of it’s own. Or should it?

This is when I realized that Jenkins is heavyweight in a world of cloud and shared hosts. I already kinda knew, since I've been administrating a Jenkins installation that claims several gigs of RAM, but that's with over thirty Maven projects and a pretty high load.

Firing Jenkins up and configuring two (Ant) projects, it claims ~150Mb RAM - for doing nothing. On a shared host, that’s unacceptable. Under the old RAM limits on Webfaction, it’d be impossible to run, now its just claiming 3/5 of my total memory allotment.

So yesterday I set up continuous integration using a Python script that runs the build, determines fail or success, and publishes the log and/or artifacts; it took a bit less than an hour to get working from scratch (admittedly using my own Mercurial lib for integration).

Now I'm thinking of maybe creating something useful out of this. Right now, I publish logs as static web pages, but I could just post them as wiki pages to Trac via RPC. That would allow logs to tie into the ticket system and source browser, and I could show build status right there on the Kanban-ish-board next to the tickets.

I've got no firm design done yet, but I'm thinking about what the requirements for a minimalistic CI tool should be:

  • Take no resources apart from disk space when idle
  • Be able to publish fail and success logs in a useful format
    • It should be trivial (for real) to implement support for new targets for result publishing
  • Ability to limit resource usage (number of concurrent builds)
  • Jobs can trigger each other
  • more?
Blaag created 111228 13:21

Problem exist between chair and keyboard

It was pointed out to me that a entry was missing. My initial reaction was "No, its not!", but then had to confess that yes, an entry that had been posted was missing. This was because I've developed (and written) for blaag on several computers, and I'd accidently made a closed head tip and lost an update. Maybe I need to add some functionality to make sure that existing posts do not disappear.

Blaag created 111228 13:04

Autohook updated

When I linked to autohook the other day, I was not prepared that somebody would actually try it, and tell me that it did not work.

So, eating my own dogfood I set up the released version of autohook, to run Blaag generation, and realized that it did, indeed, not work. This is fixed now in version 1.1.0, which also does away with using "if __name__ ..." and instead uses setuptools magic to create runnable scripts.

In addition, I realized that setting up hooks for a single repo was kind of a pain, so I simplified the configuration for that use case.

Blaag created 111221 13:48

Tools for better Python

TL;DR
pip install virtualenv pylint ipython autohook

Tools I use every day to write better Python, to make it more fun, or just easier:

  • A good editor: I prefer Emacs, you might like something else, but trust me on this - it’ll be a humongous project that forces you to use a full IDE. If you stay clear of large web frameworks, you might never need it. I started out using PyDev since I was used to Eclipse, but now I just don’t think its worth the complexity and overhead.
  • virtualenv: I use my virtualenvs for more than just Python these days, and setting a new environment up is the first thing I do when starting a new project.
  • pylint: Not only does it tell you what you might want to fix in your code, it tells you if your code gets better or worse. The more unsure you are, the more you should use pylint.
  • ipython: While REPL is nice, iPython is truly awesome for testing and prototyping. For me, its gradually replacing bash as well.
  • unittest/unittest2: Python comes with built-in unit test support - use it (where appropriate)!

These are the tools I use in almost any project, and recently I’ve added one of my own:

  • autohook, to run pylint/unit tests on commit to Mercurial

Finally, I’d like to point anyone starting out with Python to the excellent introduction set up by Mir Nazim at his site.

Blaag created 111220 09:13

On useless testing

All testing is not valuable. There. I said it.

If you take a look at the source of Blaag, you might notice a certain lack of tests. No unit test, no tests at all in fact. Does this mean I do not believe in unit tests, TDD and testing in general? No! If you take a look at hgapi, for example, I wrote almost all code using TDD since that was the only way to know I got it right.

When starting on Blaag (which did at the time not have a name), I began by creating testblaag.py, writing import unittest - and then I froze. I had no idea what a test for Blaag would look like. Everything Blaag does, is glue code. It fetches data, feeds it to docutils, collects some additional data from Mercurial, creates documents using string.Template and a RSS feed using PyRSS2Gen.

There are some utility functions (implemented as functions or not) that I could have created unit tests for, but what information would I draw from writing a test for sum([int(i) for i in hgdate_string.split()])? I write this code for me, and for me this code is obvious. So how do I know it works? I test it. Manually, since generating the entire html source is the only way for me to know that Blaag works as I intend it to work.

Whereas when writing hgapi, I wrote a tool for others to use and adapt, and a tool that I could not easily look at and see if the result was correct, Blaag is easy to verify: I look at the rendered site, in my browser. If it does not look OK, I have a bug. If it works, I have NO bugs. I might have potential bugs, like the fact that the -f option is currently required when updating, but if the code generates the result I want, consistently, and in reasonable time, Blaag performs perfectly.

Any test would simply be more code that did not add information or value - and there is a name for that kind of code: bloat. And while in the case of Blaag this is easy to see, I believe that more care should be taken generally when writing tests, just as when writing functionality - the question you should always ask yourself when producing code is simply: what value does this code add? If you cannot answer that question, you probably should not write the code, whether it's a test or not.

Because code unwritten never breaks.

Blaag created 111218 17:46

Styling

New style thanks to @markuseliasson - anything ugly is because of my adaptions, anything looking great is his work. Since the main parts, are done (still to be done: Mercurial hook, optional disqus/analytics), the next post may actually have other content than the Blaag itself!

Blaag created 111218 11:03

RSS support is go!

There's no reason to roll my own when the guys over at Dalke Scientific Software have created the awesome PyRSS2Gen. It's everything I want from a utility library - it's simple, documented by example, it does one thing, and it does it well. Thanks to them, this Blaag now sports a RSS feed!

Also some bug fixes and minor tweaks done.

Blaag created 111217 22:36

Progress is made

Timestamps from Mercurial is now used to insert creation and modification times for posts in Blaag. I've also done some code cleanup and documentation, although there is still work to be done.

Next up is RSS or Atom feed generation, just have to decide if I should use an existing library or just generate the XML. Knowing myself, I'll probably roll my own, yet again.

Blaag created 111217 20:17

It's alive!

After almost more than four hours of grueling work, my blogging platform codenamed "blaag" works.

I created blaag since I've been thinking about blogging, but didn't like the blogging platforms I found, because they were made of bloat with a little functionality hidden deep within. I did, however, like the idea behind hgblog - especially that it's based around generating the blog from rst using Mercurial hooks, allowing blogging from the comfort of Emacs. In the end, I decided to roll my own.

The code for blaag and the entries for this instance of blaag coexists at Bitbucket, so you can view the source code and the source for the entries themselves there.

The goals of blaag are, in order of priority:
  1. Nicer color scheme
  2. Time/Datestamp posts
  3. RSS support
  4. Add example of Mercurial hooks to repo

I guess documentation should be somewhere up top as well. Suggestions on what color scheme I should use are welcome! (assuming the Disqus integration works)

Blaag created 111216 22:08


Page created using blaag and abusing docutils. RSS Feed generated using PyRSS2Gen.