Fredrik Håård's Blaag

@fhaard
I'm a programmer, consultant, developer, occasional teacher and speaker. Among my least disliked programming languages are Python, and a majority of these posts are related to Python in one way or another.
RSS Feed

Batteries included: Download, unzip and parse in 13 lines

The other day I needed to download some zip files, unpack them, parse the CSV files in them, and return the data as dicts. I did the very same thing a couple of years ago, and although the source is lost, I recall having a Python (2.4?) script of about two screens to do the download - so a hundred lines. When re-implementing the solution now that I know Python and the standard library better, I ended up with 12 lines written in just a few minutes - edited for blogging clarity it clocks in at 13 lines:

import zipfile, urllib, csv
def get_items(url):
  zip, headers = urllib.urlretrieve(url)
  with zipfile.ZipFile(zip) as zf:
    csvfiles = [name for name in zf.namelist()
                 if name.endswith('.csv')]
    for filename in csvfiles:
      with zf.open(filename) as source:
        reader = csv.DictReader([line.decode('iso-8859-1')
                                  for line in source])
        for item in reader:
          yield item
  os.unlink(zip)

As trivial as it is, I think it is a nice example of just how much you can do with very little (coding) effort.

Edit: I created a gist with a cleaned up version using codecs.getreader. I'll be leaving this version as it is though.

Blaag created 130331 10:12
blog comments powered by Disqus


Page created using blaag and abusing docutils. RSS Feed generated using PyRSS2Gen.