Ian Lewis
Ian Lewis is a web developer living in Tokyo Japan. His current interests are in Django, python, alternative databases and rapid web application development. About Me...
  • Parsing email with attachments in python

    Recently I needed to be able to parse out attachments and body from multipart emails and use the resulting data to post to a service. So I wrote the code below to parse out text and html portions of the email and also parse out attachments.

    The code below is the result. I used a StringIO object from the python StringIO module to hold attachment data because the PIL module seemed to not be able to recognize images unless I either used a python file object or a StringIO object. Since it relies on the python StringIO module rather than the C one that portion should probably be rewritten. But it currently works as is so I'll post it for posterity.

    #!/usr/local/bin/python
    # vim:fileencoding=utf8
    
    from email.Header import decode_header
    import email
    from base64 import b64decode
    import sys
    from email.Parser import Parser as EmailParser
    from email.utils import parseaddr
    # cStringIOはダメ
    from StringIO import StringIO
    
    class NotSupportedMailFormat(Exception):
        pass
    
    def parse_attachment(message_part):
        content_disposition = message_part.get("Content-Disposition", None)
        if content_disposition:
            dispositions = content_disposition.strip().split(";")
            if bool(content_disposition and dispositions[0].lower() == "attachment"):
    
                file_data = message_part.get_payload(decode=True)
                attachment = StringIO(file_data)
                attachment.content_type = message_part.get_content_type()
                attachment.size = len(file_data)
                attachment.name = None
                attachment.create_date = None
                attachment.mod_date = None
                attachment.read_date = None
    
                for param in dispositions[1:]:
                    name,value = param.split("=")
                    name = name.lower()
    
                    if name == "filename":
                        attachment.name = value
                    elif name == "create-date":
                        attachment.create_date = value  #TODO: datetime
                    elif name == "modification-date":
                        attachment.mod_date = value #TODO: datetime
                    elif name == "read-date":
                        attachment.read_date = value #TODO: datetime
                return attachment
    
        return None
    
    def parse(content):
        """
        Eメールのコンテンツを受け取りparse,encodeして返す
        """
        p = EmailParser()
        msgobj = p.parse(content)
        if msgobj['Subject'] is not None:
            decodefrag = decode_header(msgobj['Subject'])
            subj_fragments = []
            for s , enc in decodefrag:
                if enc:
                    s = unicode(s , enc).encode('utf8','replace')
                subj_fragments.append(s)
            subject = ''.join(subj_fragments)
        else:
            subject = None
    
        attachments = []
        body = None
        html = None
        for part in msgobj.walk():
            attachment = parse_attachment(part)
            if attachment:
                attachments.append(attachment)
            elif part.get_content_type() == "text/plain":
                if body is None:
                    body = ""
                body += unicode(
                    part.get_payload(decode=True),
                    part.get_content_charset(),
                    'replace'
                ).encode('utf8','replace')
            elif part.get_content_type() == "text/html":
                if html is None:
                    html = ""
                html += unicode(
                    part.get_payload(decode=True),
                    part.get_content_charset(),
                    'replace'
                ).encode('utf8','replace')
        return {
            'subject' : subject,
            'body' : body,
            'html' : html,
            'from' : parseaddr(msgobj.get('From'))[1], # 名前は除いてメールアドレスのみ抽出
            'to' : parseaddr(msgobj.get('To'))[1], # 名前は除いてメールアドレスのみ抽出
            'attachments': attachments,
        }
    
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Parsing email with attachments in python
  • Transactions on Appengine

    The way to store data on Appengine is with Google's BigTable Datastore which has support for transactions. However, the transactions are quite limited in that,

    1. You can only execute callables inside transactions. Which means you basically call run_in_transaction() on a function. This can sometimes be a pain but can generally be worked around with decorators and the like.
      def my_update_function():
        # Some update code here
        ent.put()

      run_in_transaction(my_update_function)
    2. You can only update entities in the same entity group. This means all entities must be in the same ancestor tree. This can make updating entities with various relationships hard or impossible to do in a general way in a transaction.
    3. You cannot do filters in a transaction. This means you cannot do any kind of select, period. This means you cannot do the following:
      class ModelA(db.Model):
        pass

      class ModelB(db.Model):
        modela = ReferenceProperty(ModelA)

      def update_func():
        # Sorry this won't work
        modelas = ModelA.all()

        # This is the only thing that works
        modela = ModelA.get_by_id(123)

        # Jeez, you can't do this either!
        modelb = ModelB.filter('modela =', modela)
      You can only do gets based on the key of an entity. Which means if you have a relationship like the one above you need to be able to derive the key to ModelB given the key for ModelA. And since you cannot chose numeric keys with which to save entities (numeric keys are always assigned), you will need to assign key names for both entities.

    All this makes transactions a bit of a pain in Appengine but workable if you put a bit of effort into it. In the end you'll want to use key names for most every entity that matters as current backup solutions for Appengine rely on key names to maintain the keys of entities when backing up and restoring. It wouldn't be to fun if all the urls for an entity that had numeric ids changed after restoring the data from a backup.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Transactions on Appengine
  • Werkzeug and reverse urls

    I wanted to impove a Google Appengine application that a friend of mine created (ほぼ汎用イベント管理ツール(jp)) and noticed that he was redirecting directly to urls. He is using Werkzeug to handle url routing so I wondered if there was a method for generating urls from a name like you can in Django.

    It turns out you can but you give it an endpoint name rather than a url name.

    urls.py
    from werkzeug.routing import Map, Rule, RuleTemplate, Submount, EndpointPrefix

    resource = RuleTemplate([
      Rule('/${name}/', endpoint='${name}_index'),
      Rule('/${name}/create/', endpoint='create_${name}'),
      Rule('/${name}/update/<string:${var}>/', endpoint='update_${name}'),
      Rule('/${name}/delete/<string:${var}>/', endpoint='delete_${name}'),
    ])

    url_map = Map([
      Rule('/', endpoint='index'),
      Rule('/<string:slug>/', endpoint='project_or_event'),
      Rule('/form/<string:key>/<string:slug>/', endpoint='form'),
      Submount('/account', [
        Rule('/', endpoint='account_index'),
        Rule('/create/', endpoint='create_account'),
        Rule('/update/', endpoint='update_account'),
        Rule('/delete/', endpoint='delete_account'),
        Rule('/event/cancel/<string:slug>/', endpoint='event_cancel'),
      ]),
      EndpointPrefix('admin_', [
        Submount('/admin', [
          resource(name='account', var='email'),
          resource(name='project', var='slug'),
          resource(name='event', var='slug'),
          resource(name='program', var='slug'),
          resource(name='application', var='slug'),
        ]),
      ])
    ])
    views.py
    from werkzeug redirect as wredirect
    from urls import url_map

    def reverse(**kwargs):
      c = url_map.bind('')
      return wredirect(c.build(**kwargs))

    ...
       return reverse('form', dict(key=key, slug=slug))
    ...

    You need to give the build function a full endpoint. in the above example you can have endpoints like admin_create_${name} where ${name} is the name of a resource. This would need to be filled in when passing it to build.

    ...
      return reverse('admin_create_event')
    ...
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Werkzeug and reverse urls
  • Field/column Queries in Django

    One of the neat things making it's way into Django 1.1 is F object queries. The F object is kind of like the Q object as it can be used it queries but it represents a database field on the right hand side of an equality/inequality.

    For the example I'll use the example models from the "Making Queries" section of the Django Documentation.

    class Blog(models.Model):
        name = models.CharField(max_length=100)
        tagline = models.TextField()

        def __unicode__(self):
            return self.name

    class Author(models.Model):
        name = models.CharField(max_length=50)
        email = models.EmailField()

        def __unicode__(self):
            return self.name

    class Entry(models.Model):
        blog = models.ForeignKey(Blog)
        headline = models.CharField(max_length=255)
        body_text = models.TextField()
        pub_date = models.DateTimeField()
        authors = models.ManyToManyField(Author)
        n_comments = models.IntegerField()
        n_pingbacks = models.IntegerField()
        rating = models.IntegerField()

        def __unicode__(self):
            return self.headline

    Here we can do cool stuff like query for blog entries where the number of comments equals the number of pingbacks.

    >>> from django.db.models import F
    >>> Entry.objects.filter(n_pingbacks__lt=F('n_comments'))

    You can perform operations on colums or add columns together.

    >>> Entry.objects.filter(n_pingbacks__lt=F('n_comments') * 2)
    >>> Entry.objects.filter(rating__lt=F('n_comments') + F('n_pingbacks'))

    You can even span relationships across tables

    >>> Entry.objects.filter(author__name=F('blog__name'))

    This query ended up like this. ftester is the name of the application I made to test this.

    SELECT `ftester_entry`.`id`, `ftester_entry`.`blog_id`, `ftester_entry`.`headline`, `ftester_entry`.`body_text`, `ftester_entry`.`pub_date`, `ftester_entry`.`n_comments`, `ftester_entry`.`n_pingbacks`, `ftester_entry`.`rating` FROM `ftester_entry` INNER JOIN `ftester_blog` ON (`ftester_entry`.`blog_id` = `ftester_blog`.`id`) INNER JOIN `ftester_entry_authors` ON (`ftester_entry`.`id` = `ftester_entry_authors`.`entry_id`) INNER JOIN `ftester_author` ON (`ftester_entry_authors`.`author_id` = `ftester_author`.`id`) WHERE `ftester_author`.`name` =  `ftester_blog`.`name` LIMIT 21

    Note: As an aside it's interesting to note the limit on this query which actually only gets 21 records. I haven't tested it but I suppose that Django only gets a set of records at a time for performance reasons.

    But the reason the F() object was created was to allow using the value of one column in another column during an update. This allows you do do things like add 1 to the pingbacks for every entry in one go without selecting the whole batch and updating the field.

    Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Field/column Queries in Django
  • Python date range iterator

    I couldn't find something that gave me quite what I wanted so I created a simple Python generator to give me the dates between two datetimes.

    def datetimeIterator(from_date, to_date):
        from datetime import timedelta
        if from_date > to_date:
            return
        else:
            while from_date <= to_date:
                yield from_date
                from_date = from_date + timedelta(days = 1)
            return

    Update: It didn't take me long to realize that it wasn't as nice as it could have been.

    from datetime import datetime,timedelta

    def datetimeIterator(from_date=datetime.now(), to_date=None):
        while to_date is None or from_date <= to_date:
            yield from_date
            from_date = from_date + timedelta(days = 1)
        return
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Python date range iterator
  • Introduction to Algorithms

    Today my copy of Introduction to Algorithms came in the mail (a gift from the family). I've decided, mostly inspired by Peteris Krumins to revisit classic algorithms as it's been a while since I've taken a look at them.

    I have decided to also take a look at the MIT Intro to Algorithms course in order to revisit algorithms and concepts. I won't provide any lecture notes or anything since Peteris did a much better job of of writing lecture notes that I ever could but I did go ahead and create some python implementations of the sorting algorithms covered in the first lecture. These haven't been tested extensively so there might be bugs but I'm pretty sure they're working. I'd be interested to see how well these work with large input data, particularly the merge sort.

    insertion-sort.py

    #!/usr/bin/env python

    def sort(array):
      for j in xrange(1, len(array)):
        i = j - 1
        key = array[j]
        while i >= 0 and key < array[i]:
          array[i+1] = array[i]
          i = i - 1
        array[i+1] = key
      return array

    merge-sort.py

    #!/usr/bin/env python

    def sort(array):
      mergesort(array, 0, len(array))
     
    def mergesort(array, start, end):
      if end > start + 1:
        pivot = (start + end) / 2
        mergesort(array, start, pivot)
        mergesort(array, pivot, end)
        merge(array, start, pivot, end)
     
    def merge(array, start, pivot, end):
      l = array[start:pivot]
      lenl = pivot - start
      r = array[pivot:end]
      lenr = end - pivot
      i = j = 0
      for k in xrange(start,end):
        if j >= lenr or (i < lenl and l[i] <= r[j]):
          array[k] = l[i]
          i += 1
        else:
          array[k] = r[j]
          j += 1
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Introduction to Algorithms
  • Django Sitemap Framework

    Using the Django sitemap framework is so easy it's almost no work at all. Just make a sitemap object and add it to the sitemap in urls.py. The sitemap framework calls items() in your Sitemap to get the list of objects to put in the sitemap and then calls get_absolute_url() on each object.

    models.py

    from django.db import models
    ...
    class Entry(models.Model):
    ...
        @permalink
        def get_absolute_url(self):
            return ...
    ...

    sitemap.py

    from django.contrib.sitemaps import Sitemap
    from mysite.blog.models import Entry

    from django.contrib.sitemaps import Sitemap
    from mysite.blog.models import Entry

    class BlogSitemap(Sitemap):
        priority = 0.5

        def items(self):
            return Entry.objects.filter(is_draft=False)

        def lastmod(self, obj):
            return obj.pub_date

        # changefreq can be callable too
        def changefreq(self, obj):
            return "daily" if obj.comments_open() else "never"

    urls.py

    from mysite.blog.sitemap import BlogSitemap
    ...
    sitemaps = {
        "blog": BlogSitemap
    }
    (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
    ...

    You can even generate sitemap indexes and it will pagenate the indexes on Google's limit of 50,000 urls so that you don't have a problem with it crawling your indexes.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Django Sitemap Framework
  • Django admin inline forms

    For my new project dlife, I went about implementing a simple comments interface that would allow users to make comments on imported feed items. I wanted to support this in the admin in the typical manner such that when you click on an item in the admin, you can see all the comments and edit them from the item's page.

    I found that you can use inline forms in the admin but it seems to show a bunch of forms (3 by default) even though I don't have any comments for the item yet. I'll mess with this a bit more later to try to get the behavior I want.

    models.py

    class Comment(models.Model):
      '''An item comment'''
      comment_item = models.ForeignKey(Item)
      comment_date = models.DateTimeField()
      comment_user = models.ForeignKey(User, null=True, blank=True)
      comment_name = models.CharField(max_length=30)
      comment_email = models.EmailField()
      comment_homepage = models.URLField(max_length=300)
      comment_content = models.TextField(null=True, blank=True)
     
      class Meta:
        db_table="comments"
        ordering=["comment_item", "-comment_date"]

    admin.py

    class CommentInline(admin.StackedInline):
      model           = Comment
      max_num         = 1   #TODO: Fix this
      exclude         = ['comment_item','content_type','object_id']

    class ItemAdmin(admin.ModelAdmin):
      list_display    = ('item_title', 'item_date')
      exclude         = ['item_clean_content',]
      list_filter     = ('item_feed',)
      search_fields   = ('item_title','item_clean_content')
      list_per_page   = 20
     
      inlines         = [CommentInline,]
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Django admin inline forms
  • Feedparser and Django

    Over the weekend at Python Onsen I worked on a lifestream web application using Django and feedparser. I was really impressed with how simple feedparser is to use and how easy it is to get unified results from atom or rss feeds. You simply import feedparser and call feedparser.parse to parse a feed from a url.

    feeds.py
    ...
    def update_feeds():
      feeds = Feed.objects.filter(feed_deleted=False)
      for feed in feeds:
        try:
          feed_items = feedparser.parse(feed.feed_url)
          for entry in feed_items['entries']:
    ...

    You can check out feeds.py here.

    The interesting bit comes with how I had to parse the dates which sometimes include timezone info and other goodies. In my search for a solution to the problem of how to deal with dates in various formats I turned came across this blog entry which describes the problem and some possible solutions. The solution I used was the simplest and most robust (please skip the comments talking about taking a slice of the date string). I used mikael's suggestion from the comments and used the dateutil.parser to parse the date string into a proper datetime object.

    # Parse to an actual datetime object
    date_published = dateutil.parser.parse(date_published)

    This however leaves timezone info on the timestamp which isn't supported by mysql so I hand rolled some code convert the timestamp to utc and remove the timezone info.

    # Change the date to UTC and remove timezone info since MySQL doesn't
    # support it
    date_published = (date_published - date_published.utcoffset()).replace(tzinfo=None)

    I'm not sure this works in all situations yet so I might go with something like how another commenter solved the problem by converting feedparsers parsed date to a utc timestamp before converting to a datetime object. I think either way would work but which is cleaner and less prone to breakage, I'm not sure.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Feedparser and Django
  • Python Onsen Oct. 2008

    Last weekend I went to my second Python Onsen[jp] organized by Nakai-san(id:voluntas). I talked about Python Onsen in my first blog post here. Python Onsen is a 3 day event (Fri, Sat, Sun) but as before I only participated on Saturday and Sunday. This time I opted to work on creating a lifestream web app using feedparser and Django. feedparser is a snappy little parser for reading RSS and Atom feeds. The result was dlife which so far can parse a set of feeds and show them on a user's lifestream though it's not in any way user friendly yet (you have to update the feeds in the django shell :roll:).

    I also got to know my soon to be co-worker, Okano-san (id:tokibito[jp]), by talking about jQuery Internals' data() function and web/Django development.

    Here's a recap:

    • Worked on a sweetcron lifestream replacement in Django (dlife)
    • Onsen is pretty lonely by yourself.
    • Introduced id:tokibito[jp] to the jQuery Internals' data() function
    • Since I come on Satruday, I always miss introductions so I never know who is who.
    • feedparser is really simple and easy to use. Though I'm not sure what I'll do about pictures and video yet.
    • No one mentioned my blog or linked to it in their posts :(  (Maybe because I never write anything? )
    • In a raffle、I got a cool Python shirt from Accense Technologies'[jp] Masuda-san(id:whosaysni[jp]).
    • It was lonely going home by myself from Kinomiya station.


    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Python Onsen Oct. 2008