Ian Lewis
Ian Lewis is a web developer living in Tokyo Japan. His current interests are in Django, python, alternative databases and rapid web application development. About Me...
  • Key Value Storage Systems ... and Beyond ... with Python

    Google docs wouldn't let me share the presentation publicly with people outside our company's domain and it gave me an error when I tried to download it as a Powerpoint file or PDF so I was forced to recreate my presentation locally. Anyway, I placed the slides to my talk at PyCon Asia online please check it out on slideshare.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Key Value Storage Systems ... and Beyond ... with Python
  • python-openvcdiff and Cython

    I started a project today to implement an interface for the open-vcdiff using Cython. I'm not a C++ master and the Python C libraries are pretty new to me but I managed to expose and implement a few methods of the VCDiffEncoder class. The hardest part so far has been trying to figure out how to use the C++ standard library types like std::string. I'm also not sure how I can interface with python in such a way as to allow fast processing of potentially large binary data. Normally I would use a file-like object in Python to create a kind of string but open-vcdiff being C++ has a slightly different interface for dealing with binary blobs.

    The code is over at bitbucket in my python-openvcdiff repository.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - python-openvcdiff and Cython
  • Parsing email with attachments in python

    Recently I needed to be able to parse out attachments and body from multipart emails and use the resulting data to post to a service. So I wrote the code below to parse out text and html portions of the email and also parse out attachments.

    The code below is the result. I used a StringIO object from the python StringIO module to hold attachment data because the PIL module seemed to not be able to recognize images unless I either used a python file object or a StringIO object. Since it relies on the python StringIO module rather than the C one that portion should probably be rewritten. But it currently works as is so I'll post it for posterity.

    #!/usr/local/bin/python
    # vim:fileencoding=utf8
    
    from email.Header import decode_header
    import email
    from base64 import b64decode
    import sys
    from email.Parser import Parser as EmailParser
    from email.utils import parseaddr
    # cStringIOはダメ
    from StringIO import StringIO
    
    class NotSupportedMailFormat(Exception):
        pass
    
    def parse_attachment(message_part):
        content_disposition = message_part.get("Content-Disposition", None)
        if content_disposition:
            dispositions = content_disposition.strip().split(";")
            if bool(content_disposition and dispositions[0].lower() == "attachment"):
    
                file_data = message_part.get_payload(decode=True)
                attachment = StringIO(file_data)
                attachment.content_type = message_part.get_content_type()
                attachment.size = len(file_data)
                attachment.name = None
                attachment.create_date = None
                attachment.mod_date = None
                attachment.read_date = None
    
                for param in dispositions[1:]:
                    name,value = param.split("=")
                    name = name.lower()
    
                    if name == "filename":
                        attachment.name = value
                    elif name == "create-date":
                        attachment.create_date = value  #TODO: datetime
                    elif name == "modification-date":
                        attachment.mod_date = value #TODO: datetime
                    elif name == "read-date":
                        attachment.read_date = value #TODO: datetime
                return attachment
    
        return None
    
    def parse(content):
        """
        Eメールのコンテンツを受け取りparse,encodeして返す
        """
        p = EmailParser()
        msgobj = p.parse(content)
        if msgobj['Subject'] is not None:
            decodefrag = decode_header(msgobj['Subject'])
            subj_fragments = []
            for s , enc in decodefrag:
                if enc:
                    s = unicode(s , enc).encode('utf8','replace')
                subj_fragments.append(s)
            subject = ''.join(subj_fragments)
        else:
            subject = None
    
        attachments = []
        body = None
        html = None
        for part in msgobj.walk():
            attachment = parse_attachment(part)
            if attachment:
                attachments.append(attachment)
            elif part.get_content_type() == "text/plain":
                if body is None:
                    body = ""
                body += unicode(
                    part.get_payload(decode=True),
                    part.get_content_charset(),
                    'replace'
                ).encode('utf8','replace')
            elif part.get_content_type() == "text/html":
                if html is None:
                    html = ""
                html += unicode(
                    part.get_payload(decode=True),
                    part.get_content_charset(),
                    'replace'
                ).encode('utf8','replace')
        return {
            'subject' : subject,
            'body' : body,
            'html' : html,
            'from' : parseaddr(msgobj.get('From'))[1], # 名前は除いてメールアドレスのみ抽出
            'to' : parseaddr(msgobj.get('To'))[1], # 名前は除いてメールアドレスのみ抽出
            'attachments': attachments,
        }
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Parsing email with attachments in python
  • Transactions on Appengine

    The way to store data on Appengine is with Google's BigTable Datastore which has support for transactions. However, the transactions are quite limited in that,

    1. You can only execute callables inside transactions. Which means you basically call run_in_transaction() on a function. This can sometimes be a pain but can generally be worked around with decorators and the like.
      def my_update_function():
        # Some update code here
        ent.put()

      run_in_transaction(my_update_function)
    2. You can only update entities in the same entity group. This means all entities must be in the same ancestor tree. This can make updating entities with various relationships hard or impossible to do in a general way in a transaction.
    3. You cannot do filters in a transaction. This means you cannot do any kind of select, period. This means you cannot do the following:
      class ModelA(db.Model):
        pass

      class ModelB(db.Model):
        modela = ReferenceProperty(ModelA)

      def update_func():
        # Sorry this won't work
        modelas = ModelA.all()

        # This is the only thing that works
        modela = ModelA.get_by_id(123)

        # Jeez, you can't do this either!
        modelb = ModelB.filter('modela =', modela)
      You can only do gets based on the key of an entity. Which means if you have a relationship like the one above you need to be able to derive the key to ModelB given the key for ModelA. And since you cannot chose numeric keys with which to save entities (numeric keys are always assigned), you will need to assign key names for both entities.

    All this makes transactions a bit of a pain in Appengine but workable if you put a bit of effort into it. In the end you'll want to use key names for most every entity that matters as current backup solutions for Appengine rely on key names to maintain the keys of entities when backing up and restoring. It wouldn't be to fun if all the urls for an entity that had numeric ids changed after restoring the data from a backup.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Transactions on Appengine
  • Werkzeug and reverse urls

    I wanted to impove a Google Appengine application that a friend of mine created (ほぼ汎用イベント管理ツール(jp)) and noticed that he was redirecting directly to urls. He is using Werkzeug to handle url routing so I wondered if there was a method for generating urls from a name like you can in Django.

    It turns out you can but you give it an endpoint name rather than a url name.

    urls.py
    from werkzeug.routing import Map, Rule, RuleTemplate, Submount, EndpointPrefix

    resource = RuleTemplate([
      Rule('/${name}/', endpoint='${name}_index'),
      Rule('/${name}/create/', endpoint='create_${name}'),
      Rule('/${name}/update/<string:${var}>/', endpoint='update_${name}'),
      Rule('/${name}/delete/<string:${var}>/', endpoint='delete_${name}'),
    ])

    url_map = Map([
      Rule('/', endpoint='index'),
      Rule('/<string:slug>/', endpoint='project_or_event'),
      Rule('/form/<string:key>/<string:slug>/', endpoint='form'),
      Submount('/account', [
        Rule('/', endpoint='account_index'),
        Rule('/create/', endpoint='create_account'),
        Rule('/update/', endpoint='update_account'),
        Rule('/delete/', endpoint='delete_account'),
        Rule('/event/cancel/<string:slug>/', endpoint='event_cancel'),
      ]),
      EndpointPrefix('admin_', [
        Submount('/admin', [
          resource(name='account', var='email'),
          resource(name='project', var='slug'),
          resource(name='event', var='slug'),
          resource(name='program', var='slug'),
          resource(name='application', var='slug'),
        ]),
      ])
    ])
    views.py
    from werkzeug redirect as wredirect
    from urls import url_map

    def reverse(**kwargs):
      c = url_map.bind('')
      return wredirect(c.build(**kwargs))

    ...
       return reverse('form', dict(key=key, slug=slug))
    ...

    You need to give the build function a full endpoint. in the above example you can have endpoints like admin_create_${name} where ${name} is the name of a resource. This would need to be filled in when passing it to build.

    ...
      return reverse('admin_create_event')
    ...
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Werkzeug and reverse urls
  • Field/column Queries in Django

    One of the neat things making it's way into Django 1.1 is F object queries. The F object is kind of like the Q object as it can be used it queries but it represents a database field on the right hand side of an equality/inequality.

    For the example I'll use the example models from the "Making Queries" section of the Django Documentation.

    class Blog(models.Model):
        name = models.CharField(max_length=100)
        tagline = models.TextField()

        def __unicode__(self):
            return self.name

    class Author(models.Model):
        name = models.CharField(max_length=50)
        email = models.EmailField()

        def __unicode__(self):
            return self.name

    class Entry(models.Model):
        blog = models.ForeignKey(Blog)
        headline = models.CharField(max_length=255)
        body_text = models.TextField()
        pub_date = models.DateTimeField()
        authors = models.ManyToManyField(Author)
        n_comments = models.IntegerField()
        n_pingbacks = models.IntegerField()
        rating = models.IntegerField()

        def __unicode__(self):
            return self.headline

    Here we can do cool stuff like query for blog entries where the number of comments equals the number of pingbacks.

    >>> from django.db.models import F
    >>> Entry.objects.filter(n_pingbacks__lt=F('n_comments'))

    You can perform operations on colums or add columns together.

    >>> Entry.objects.filter(n_pingbacks__lt=F('n_comments') * 2)
    >>> Entry.objects.filter(rating__lt=F('n_comments') + F('n_pingbacks'))

    You can even span relationships across tables

    >>> Entry.objects.filter(author__name=F('blog__name'))

    This query ended up like this. ftester is the name of the application I made to test this.

    SELECT `ftester_entry`.`id`, `ftester_entry`.`blog_id`, `ftester_entry`.`headline`, `ftester_entry`.`body_text`, `ftester_entry`.`pub_date`, `ftester_entry`.`n_comments`, `ftester_entry`.`n_pingbacks`, `ftester_entry`.`rating` FROM `ftester_entry` INNER JOIN `ftester_blog` ON (`ftester_entry`.`blog_id` = `ftester_blog`.`id`) INNER JOIN `ftester_entry_authors` ON (`ftester_entry`.`id` = `ftester_entry_authors`.`entry_id`) INNER JOIN `ftester_author` ON (`ftester_entry_authors`.`author_id` = `ftester_author`.`id`) WHERE `ftester_author`.`name` =  `ftester_blog`.`name` LIMIT 21

    Note: As an aside it's interesting to note the limit on this query which actually only gets 21 records. I haven't tested it but I suppose that Django only gets a set of records at a time for performance reasons.

    But the reason the F() object was created was to allow using the value of one column in another column during an update. This allows you do do things like add 1 to the pingbacks for every entry in one go without selecting the whole batch and updating the field.

    Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Field/column Queries in Django
  • Python date range iterator

    I couldn't find something that gave me quite what I wanted so I created a simple Python generator to give me the dates between two datetimes.

    def datetimeIterator(from_date, to_date):
        from datetime import timedelta
        if from_date > to_date:
            return
        else:
            while from_date <= to_date:
                yield from_date
                from_date = from_date + timedelta(days = 1)
            return
    

    Update: It didn't take me long to realize that it wasn't as nice as it could have been.

    from datetime import datetime,timedelta
    
    def datetimeIterator(from_date=datetime.now(), to_date=None):
        while to_date is None or from_date <= to_date:
            yield from_date
            from_date = from_date + timedelta(days = 1)
        return
    

    Another Update based on the comments below:

    from datetime import datetime,timedelta
    
    def datetimeIterator(from_date=None, to_date=None, delta=timedelta(minutes=1)):
        from_date = from_date or datetime.now()
        while to_date is None or from_date <= to_date:
            yield from_date
            from_date = from_date + delta
        return
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Python date range iterator
  • Introduction to Algorithms

    Today my copy of Introduction to Algorithms came in the mail (a gift from the family). I've decided, mostly inspired by Peteris Krumins to revisit classic algorithms as it's been a while since I've taken a look at them.

    I have decided to also take a look at the MIT Intro to Algorithms course in order to revisit algorithms and concepts. I won't provide any lecture notes or anything since Peteris did a much better job of of writing lecture notes that I ever could but I did go ahead and create some python implementations of the sorting algorithms covered in the first lecture. These haven't been tested extensively so there might be bugs but I'm pretty sure they're working. I'd be interested to see how well these work with large input data, particularly the merge sort.

    insertion-sort.py

    #!/usr/bin/env python

    def sort(array):
      for j in xrange(1, len(array)):
        i = j - 1
        key = array[j]
        while i >= 0 and key < array[i]:
          array[i+1] = array[i]
          i = i - 1
        array[i+1] = key
      return array

    merge-sort.py

    #!/usr/bin/env python

    def sort(array):
      mergesort(array, 0, len(array))
     
    def mergesort(array, start, end):
      if end > start + 1:
        pivot = (start + end) / 2
        mergesort(array, start, pivot)
        mergesort(array, pivot, end)
        merge(array, start, pivot, end)
     
    def merge(array, start, pivot, end):
      l = array[start:pivot]
      lenl = pivot - start
      r = array[pivot:end]
      lenr = end - pivot
      i = j = 0
      for k in xrange(start,end):
        if j >= lenr or (i < lenl and l[i] <= r[j]):
          array[k] = l[i]
          i += 1
        else:
          array[k] = r[j]
          j += 1
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Introduction to Algorithms
  • Django Sitemap Framework

    Using the Django sitemap framework is so easy it's almost no work at all. Just make a sitemap object and add it to the sitemap in urls.py. The sitemap framework calls items() in your Sitemap to get the list of objects to put in the sitemap and then calls get_absolute_url() on each object.

    models.py

    from django.db import models
    ...
    class Entry(models.Model):
    ...
        @permalink
        def get_absolute_url(self):
            return ...
    ...

    sitemap.py

    from django.contrib.sitemaps import Sitemap
    from mysite.blog.models import Entry

    from django.contrib.sitemaps import Sitemap
    from mysite.blog.models import Entry

    class BlogSitemap(Sitemap):
        priority = 0.5

        def items(self):
            return Entry.objects.filter(is_draft=False)

        def lastmod(self, obj):
            return obj.pub_date

        # changefreq can be callable too
        def changefreq(self, obj):
            return "daily" if obj.comments_open() else "never"

    urls.py

    from mysite.blog.sitemap import BlogSitemap
    ...
    sitemaps = {
        "blog": BlogSitemap
    }
    (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
    ...

    You can even generate sitemap indexes and it will pagenate the indexes on Google's limit of 50,000 urls so that you don't have a problem with it crawling your indexes.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Django Sitemap Framework
  • Django admin inline forms

    For my new project dlife (Update: Now django-lifestream), I went about implementing a simple comments interface that would allow users to make comments on imported feed items. I wanted to support this in the admin in the typical manner such that when you click on an item in the admin, you can see all the comments and edit them from the item's page.

    I found that you can use inline forms in the admin but it seems to show a bunch of forms (3 by default) even though I don't have any comments for the item yet. I'll mess with this a bit more later to try to get the behavior I want.

    models.py

    class Comment(models.Model):
      '''An item comment'''
      comment_item = models.ForeignKey(Item)
      comment_date = models.DateTimeField()
      comment_user = models.ForeignKey(User, null=True, blank=True)
      comment_name = models.CharField(max_length=30)
      comment_email = models.EmailField()
      comment_homepage = models.URLField(max_length=300)
      comment_content = models.TextField(null=True, blank=True)
     
      class Meta:
        db_table="comments"
        ordering=["comment_item", "-comment_date"]

    admin.py

    class CommentInline(admin.StackedInline):
      model           = Comment
      max_num         = 1   #TODO: Fix this
      exclude         = ['comment_item','content_type','object_id']

    class ItemAdmin(admin.ModelAdmin):
      list_display    = ('item_title', 'item_date')
      exclude         = ['item_clean_content',]
      list_filter     = ('item_feed',)
      search_fields   = ('item_title','item_clean_content')
      list_per_page   = 20
     
      inlines         = [CommentInline,]
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Django admin inline forms