Engineer in Tokyo

Feedparser and Django

Over the weekend at Python Onsen I worked on a lifestream web application using Django and feedparser. I was really impressed with how simple feedparser is to use and how easy it is to get unified results from atom or rss feeds. You simply import feedparser and call feedparser.parse to parse a feed from a url.

feeds.py

...
def update_feeds():
  feeds = Feed.objects.filter(feed_deleted=False)
  for feed in feeds:
    try:
      feed_items = feedparser.parse(feed.feed_url)
      for entry in feed_items['entries']:
...

You can check out feeds.py here.

The interesting bit comes with how I had to parse the dates which sometimes include timezone info and other goodies. In my search for a solution to the problem of how to deal with dates in various formats I turned came across this blog entry which describes the problem and some possible solutions. The solution I used was the simplest and most robust (please skip the comments talking about taking a slice of the date string). I used mikael's suggestion from the comments and used the dateutil.parser to parse the date string into a proper datetime object.

# Parse to an actual datetime object
date_published = dateutil.parser.parse(date_published)

This however leaves timezone info on the timestamp which isn't supported by mysql so I hand rolled some code convert the timestamp to utc and remove the timezone info.

# Change the date to UTC and remove timezone info since MySQL doesn't
# support it
date_published = (date_published - date_published.utcoffset()).replace(tzinfo=None)

I'm not sure this works in all situations yet so I might go with something like how another commenter solved the problem by converting feedparsers parsed date to a utc timestamp before converting to a datetime object. I think either way would work but which is cleaner and less prone to breakage, I'm not sure.