Google docs wouldn't let me share the presentation publicly with people outside our company's domain and it gave me an error when I tried to download it as a Powerpoint file or PDF so I was forced to recreate my presentation locally. Anyway, I placed the slides to my talk at PyCon Asia online please check it out on slideshare.
-
Key Value Storage Systems ... and Beyond ... with Python
2010-06-15 16:33:38Send feedback Tweet
-
python-openvcdiff and Cython
2010-03-21 18:26:15I started a project today to implement an interface for the open-vcdiff using Cython. I'm not a C++ master and the Python C libraries are pretty new to me but I managed to expose and implement a few methods of the VCDiffEncoder class. The hardest part so far has been trying to figure out how to use the C++ standard library types like std::string. I'm also not sure how I can interface with python in such a way as to allow fast processing of potentially large binary data. Normally I would use a file-like object in Python to create a kind of string but open-vcdiff being C++ has a slightly different interface for dealing with binary blobs.
The code is over at bitbucket in my python-openvcdiff repository.
-
Parsing email with attachments in python
2009-09-21 11:33:31Recently I needed to be able to parse out attachments and body from multipart emails and use the resulting data to post to a service. So I wrote the code below to parse out text and html portions of the email and also parse out attachments.
The code below is the result. I used a StringIO object from the python StringIO module to hold attachment data because the PIL module seemed to not be able to recognize images unless I either used a python file object or a StringIO object. Since it relies on the python StringIO module rather than the C one that portion should probably be rewritten. But it currently works as is so I'll post it for posterity.
#!/usr/local/bin/python # vim:fileencoding=utf8 from email.Header import decode_header import email from base64 import b64decode import sys from email.Parser import Parser as EmailParser from email.utils import parseaddr # cStringIOはダメ from StringIO import StringIO class NotSupportedMailFormat(Exception): pass def parse_attachment(message_part): content_disposition = message_part.get("Content-Disposition", None) if content_disposition: dispositions = content_disposition.strip().split(";") if bool(content_disposition and dispositions[0].lower() == "attachment"): file_data = message_part.get_payload(decode=True) attachment = StringIO(file_data) attachment.content_type = message_part.get_content_type() attachment.size = len(file_data) attachment.name = None attachment.create_date = None attachment.mod_date = None attachment.read_date = None for param in dispositions[1:]: name,value = param.split("=") name = name.lower() if name == "filename": attachment.name = value elif name == "create-date": attachment.create_date = value #TODO: datetime elif name == "modification-date": attachment.mod_date = value #TODO: datetime elif name == "read-date": attachment.read_date = value #TODO: datetime return attachment return None def parse(content): """ Eメールのコンテンツを受け取りparse,encodeして返す """ p = EmailParser() msgobj = p.parse(content) if msgobj['Subject'] is not None: decodefrag = decode_header(msgobj['Subject']) subj_fragments = [] for s , enc in decodefrag: if enc: s = unicode(s , enc).encode('utf8','replace') subj_fragments.append(s) subject = ''.join(subj_fragments) else: subject = None attachments = [] body = None html = None for part in msgobj.walk(): attachment = parse_attachment(part) if attachment: attachments.append(attachment) elif part.get_content_type() == "text/plain": if body is None: body = "" body += unicode( part.get_payload(decode=True), part.get_content_charset(), 'replace' ).encode('utf8','replace') elif part.get_content_type() == "text/html": if html is None: html = "" html += unicode( part.get_payload(decode=True), part.get_content_charset(), 'replace' ).encode('utf8','replace') return { 'subject' : subject, 'body' : body, 'html' : html, 'from' : parseaddr(msgobj.get('From'))[1], # 名前は除いてメールアドレスのみ抽出 'to' : parseaddr(msgobj.get('To'))[1], # 名前は除いてメールアドレスのみ抽出 'attachments': attachments, }
-
Transactions on Appengine
2009-03-19 01:44:52The way to store data on Appengine is with Google's BigTable Datastore which has support for transactions. However, the transactions are quite limited in that,
- You can only execute callables inside transactions. Which means you basically call run_in_transaction() on a function. This can sometimes be a pain but can generally be worked around with decorators and the like.
def my_update_function(): # Some update code here ent.put() run_in_transaction(my_update_function) - You can only update entities in the same entity group. This means all entities must be in the same ancestor tree. This can make updating entities with various relationships hard or impossible to do in a general way in a transaction.
- You cannot do filters in a transaction. This means you cannot do any kind of select, period. This means you cannot do the following:
You can only do gets based on the key of an entity. Which means if you have a relationship like the one above you need to be able to derive the key to ModelB given the key for ModelA. And since you cannot chose numeric keys with which to save entities (numeric keys are always assigned), you will need to assign key names for both entities.
class ModelA(db.Model): pass class ModelB(db.Model): modela = ReferenceProperty(ModelA) def update_func(): # Sorry this won't work modelas = ModelA.all() # This is the only thing that works modela = ModelA.get_by_id(123) # Jeez, you can't do this either! modelb = ModelB.filter('modela =', modela)
All this makes transactions a bit of a pain in Appengine but workable if you put a bit of effort into it. In the end you'll want to use key names for most every entity that matters as current backup solutions for Appengine rely on key names to maintain the keys of entities when backing up and restoring. It wouldn't be to fun if all the urls for an entity that had numeric ids changed after restoring the data from a backup.
- You can only execute callables inside transactions. Which means you basically call run_in_transaction() on a function. This can sometimes be a pain but can generally be worked around with decorators and the like.
-
Werkzeug and reverse urls
2009-03-14 11:57:52I wanted to impove a Google Appengine application that a friend of mine created (ほぼ汎用イベント管理ツール(jp)) and noticed that he was redirecting directly to urls. He is using Werkzeug to handle url routing so I wondered if there was a method for generating urls from a name like you can in Django.
It turns out you can but you give it an endpoint name rather than a url name.
urls.pyviews.pyfrom werkzeug.routing import Map, Rule, RuleTemplate, Submount, EndpointPrefix resource = RuleTemplate([ Rule('/${name}/', endpoint='${name}_index'), Rule('/${name}/create/', endpoint='create_${name}'), Rule('/${name}/update/<string:${var}>/', endpoint='update_${name}'), Rule('/${name}/delete/<string:${var}>/', endpoint='delete_${name}'), ]) url_map = Map([ Rule('/', endpoint='index'), Rule('/<string:slug>/', endpoint='project_or_event'), Rule('/form/<string:key>/<string:slug>/', endpoint='form'), Submount('/account', [ Rule('/', endpoint='account_index'), Rule('/create/', endpoint='create_account'), Rule('/update/', endpoint='update_account'), Rule('/delete/', endpoint='delete_account'), Rule('/event/cancel/<string:slug>/', endpoint='event_cancel'), ]), EndpointPrefix('admin_', [ Submount('/admin', [ resource(name='account', var='email'), resource(name='project', var='slug'), resource(name='event', var='slug'), resource(name='program', var='slug'), resource(name='application', var='slug'), ]), ]) ]) from werkzeug redirect as wredirect from urls import url_map def reverse(**kwargs): c = url_map.bind('') return wredirect(c.build(**kwargs)) ... return reverse('form', dict(key=key, slug=slug)) ... You need to give the build function a full endpoint. in the above example you can have endpoints like admin_create_${name} where ${name} is the name of a resource. This would need to be filled in when passing it to build.
... return reverse('admin_create_event') ... -
Field/column Queries in Django
2009-02-04 23:24:38One of the neat things making it's way into Django 1.1 is F object queries. The F object is kind of like the Q object as it can be used it queries but it represents a database field on the right hand side of an equality/inequality.
For the example I'll use the example models from the "Making Queries" section of the Django Documentation.
class Blog(models.Model): name = models.CharField(max_length=100) tagline = models.TextField() def __unicode__(self): return self.name class Author(models.Model): name = models.CharField(max_length=50) email = models.EmailField() def __unicode__(self): return self.name class Entry(models.Model): blog = models.ForeignKey(Blog) headline = models.CharField(max_length=255) body_text = models.TextField() pub_date = models.DateTimeField() authors = models.ManyToManyField(Author) n_comments = models.IntegerField() n_pingbacks = models.IntegerField() rating = models.IntegerField() def __unicode__(self): return self.headline Here we can do cool stuff like query for blog entries where the number of comments equals the number of pingbacks.
>>> from django.db.models import F >>> Entry.objects.filter(n_pingbacks__lt=F('n_comments')) You can perform operations on colums or add columns together.
>>> Entry.objects.filter(n_pingbacks__lt=F('n_comments') * 2) >>> Entry.objects.filter(rating__lt=F('n_comments') + F('n_pingbacks')) You can even span relationships across tables
>>> Entry.objects.filter(author__name=F('blog__name')) This query ended up like this. ftester is the name of the application I made to test this.
SELECT `ftester_entry`.`id`, `ftester_entry`.`blog_id`, `ftester_entry`.`headline`, `ftester_entry`.`body_text`, `ftester_entry`.`pub_date`, `ftester_entry`.`n_comments`, `ftester_entry`.`n_pingbacks`, `ftester_entry`.`rating` FROM `ftester_entry` INNER JOIN `ftester_blog` ON (`ftester_entry`.`blog_id` = `ftester_blog`.`id`) INNER JOIN `ftester_entry_authors` ON (`ftester_entry`.`id` = `ftester_entry_authors`.`entry_id`) INNER JOIN `ftester_author` ON (`ftester_entry_authors`.`author_id` = `ftester_author`.`id`) WHERE `ftester_author`.`name` = `ftester_blog`.`name` LIMIT 21 Note: As an aside it's interesting to note the limit on this query which actually only gets 21 records. I haven't tested it but I suppose that Django only gets a set of records at a time for performance reasons.
But the reason the F() object was created was to allow using the value of one column in another column during an update. This allows you do do things like add 1 to the pingbacks for every entry in one go without selecting the whole batch and updating the field.
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1) -
Python date range iterator
2008-12-19 15:27:30I couldn't find something that gave me quite what I wanted so I created a simple Python generator to give me the dates between two datetimes.
def datetimeIterator(from_date, to_date): from datetime import timedelta if from_date > to_date: return else: while from_date <= to_date: yield from_date from_date = from_date + timedelta(days = 1) return
Update: It didn't take me long to realize that it wasn't as nice as it could have been.
from datetime import datetime,timedelta def datetimeIterator(from_date=datetime.now(), to_date=None): while to_date is None or from_date <= to_date: yield from_date from_date = from_date + timedelta(days = 1) return
Another Update based on the comments below:
from datetime import datetime,timedelta def datetimeIterator(from_date=None, to_date=None, delta=timedelta(minutes=1)): from_date = from_date or datetime.now() while to_date is None or from_date <= to_date: yield from_date from_date = from_date + delta return
-
Introduction to Algorithms
2008-12-15 18:10:37
Today my copy of Introduction to Algorithms came in the mail (a gift from the family). I've decided, mostly inspired by Peteris Krumins to revisit classic algorithms as it's been a while since I've taken a look at them.
I have decided to also take a look at the MIT Intro to Algorithms course in order to revisit algorithms and concepts. I won't provide any lecture notes or anything since Peteris did a much better job of of writing lecture notes that I ever could but I did go ahead and create some python implementations of the sorting algorithms covered in the first lecture. These haven't been tested extensively so there might be bugs but I'm pretty sure they're working. I'd be interested to see how well these work with large input data, particularly the merge sort.
insertion-sort.py
#!/usr/bin/env python def sort(array): for j in xrange(1, len(array)): i = j - 1 key = array[j] while i >= 0 and key < array[i]: array[i+1] = array[i] i = i - 1 array[i+1] = key return array merge-sort.py
#!/usr/bin/env python def sort(array): mergesort(array, 0, len(array)) def mergesort(array, start, end): if end > start + 1: pivot = (start + end) / 2 mergesort(array, start, pivot) mergesort(array, pivot, end) merge(array, start, pivot, end) def merge(array, start, pivot, end): l = array[start:pivot] lenl = pivot - start r = array[pivot:end] lenr = end - pivot i = j = 0 for k in xrange(start,end): if j >= lenr or (i < lenl and l[i] <= r[j]): array[k] = l[i] i += 1 else: array[k] = r[j] j += 1 -
Django Sitemap Framework
2008-11-18 21:22:06Using the Django sitemap framework is so easy it's almost no work at all. Just make a sitemap object and add it to the sitemap in urls.py. The sitemap framework calls items() in your Sitemap to get the list of objects to put in the sitemap and then calls get_absolute_url() on each object.
models.py
from django.db import models ... class Entry(models.Model): ... @permalink def get_absolute_url(self): return ... ... sitemap.py
from django.contrib.sitemaps import Sitemap from mysite.blog.models import Entry from django.contrib.sitemaps import Sitemap from mysite.blog.models import Entry class BlogSitemap(Sitemap): priority = 0.5 def items(self): return Entry.objects.filter(is_draft=False) def lastmod(self, obj): return obj.pub_date # changefreq can be callable too def changefreq(self, obj): return "daily" if obj.comments_open() else "never" urls.py
from mysite.blog.sitemap import BlogSitemap ... sitemaps = { "blog": BlogSitemap } (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) ... You can even generate sitemap indexes and it will pagenate the indexes on Google's limit of 50,000 urls so that you don't have a problem with it crawling your indexes.
-
Django admin inline forms
2008-11-09 01:09:55For my new project dlife (Update: Now django-lifestream), I went about implementing a simple comments interface that would allow users to make comments on imported feed items. I wanted to support this in the admin in the typical manner such that when you click on an item in the admin, you can see all the comments and edit them from the item's page.
I found that you can use inline forms in the admin but it seems to show a bunch of forms (3 by default) even though I don't have any comments for the item yet. I'll mess with this a bit more later to try to get the behavior I want.
models.py
class Comment(models.Model): '''An item comment''' comment_item = models.ForeignKey(Item) comment_date = models.DateTimeField() comment_user = models.ForeignKey(User, null=True, blank=True) comment_name = models.CharField(max_length=30) comment_email = models.EmailField() comment_homepage = models.URLField(max_length=300) comment_content = models.TextField(null=True, blank=True) class Meta: db_table="comments" ordering=["comment_item", "-comment_date"] admin.py
class CommentInline(admin.StackedInline): model = Comment max_num = 1 #TODO: Fix this exclude = ['comment_item','content_type','object_id'] class ItemAdmin(admin.ModelAdmin): list_display = ('item_title', 'item_date') exclude = ['item_clean_content',] list_filter = ('item_feed',) search_fields = ('item_title','item_clean_content') list_per_page = 20 inlines = [CommentInline,]
