Ian Lewis
Ian Lewis is a web developer living in Tokyo Japan. His current interests are in Django, python, alternative databases and rapid web application development. About Me...
  • 'self' ForeignKeys always result in a JOIN

    I came across a little annoyance in Django today. I found that ForeignKeys that reference 'self', i.e. they point to the same table, always result in a join in a filter.

    Take this normal foreign key reference.

       class Customer(models.Models):
           user = models.ForeignKey(User)
    
       >>> Customer.objects.filter(user__isnull)._as_sql()
       ('SELECT U0."id" FROM "accounts_customer" U0 WHERE U0."customer_id" IS NULL',
    ())
    

    Now lets look at a version of the customer model with a self reference.

       class Customer(models.Models):
           user = models.ForeignKey(User)
           other_cust = models.ForeignKey('self')
    
       >>> Customer.objects.filter(user__isnull)._as_sql()
       ('SELECT U0."id" FROM "accounts_customer" U0 LEFT OUTER JOIN "accounts_customer" U1 ON (U0."other_cust_id" = U1."id") WHERE U1."id" IS NULL',
    ())
    

    Hmm, yuck. That little extra JOIN is going to kill performance if the table is big. Let's do it the right way.

    >>> Customer.objects.extra(where=["other_cust_id IS NULL"])
    ('SELECT U0."id" FROM "accounts_customer" U0 WHERE other_cust_id IS NULL', ())
    

    Ahh, that's better. I don't really like using extra() but in situations like these I'm glad it's there.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - 'self' ForeignKeys always result in a JOIN
  • Django template2pdf

    This is cool Django application from Yasushi Masuda which allows you to render data to a pdf using trml2pdf.

    template2pdf provides a generic view called direct_to_pdf which will render a rml template directly to pdf.

    # coding: utf-8
    
    from django.http import HttpResponse
    from django_trml2pdf import direct_to_pdf
    
    from django.shortcuts import render_to_response
    
    def myview(request, template_name='trml2pdf/mytemplate.rml'):
        params = {}
        return HttpResponse(
            direct_to_pdf(request, template_name, params),
            mimetype='application/pdf')
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Django template2pdf
  • Running django with daemontools

    Running django fastcgi with daemontools is rather easy but getting it to run in the foreground with the proper user takes a bit of knowledge about how bash works and the tools in daemontools.

    In order to run the fastcgi daemon in the foreground you need to specify the daemonize=false option to the fastcgi command.

    Next the daemon will be started as the root user unless the daemon has an option to change the user itself. The fastcgi daemon doesn't so we will use a tool from daemontools called setuidgid to set the user to www which is the user we want to run the daemon as.

    Finally since we are using setuidgid we need to use the exec command in bash so that the standard process pipe established with the fastcgi process.

    /service/myapp/run

    #!/bin/bash
    
    BASEDIR="/home/www/"
    PIDFILE="$BASEDIR/app.pid"
    
    exec setuidgid www python /home/www/django-prj/manage.py runfcgi \
        --settings=settings_production method=threaded  port=8001 \
        pidfile=$PIDFILE daemonize=false 2>&1
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Running django with daemontools
  • Testing HTTPS with Django's Development Server

    Django's development server doesn't normally support HTTPS so it's hard to test applications with HTTPS without deploying the application to a real web server that supports HTTPS. The secret is to use two development server instances, one for http and one for https, and to use a tool called stunnel to can create an ssl tunnel to the development server to support HTTPS.

    First we need to set up stunnel using the documentation .

    After it's installed you can create a pem for stunnel.

    openssl req -new -days 365 -nodes -out newreq.pem -keyout /etc/stunnel/stunnel.pem
    

    After that we create a settings file (which I saved to a file called dev_https). The "accept" setting is the port of the HTTPS connection. The "connect" is the port of the development server instance we are using for https.

    pid =
    
    [https]
    accept=8002
    connect=8003
    

    After that we start the stunnel daemon.

    stunnel dev_http
    

    Now we start the Django development server instance we are going to use for https. The HTTPS=on environment variable allows request.is_secure() to return True properly.

    HTTPS=on python manage.py runserver 8003
    

    Then we start the http server.

    python manage.py runserver 8000
    

    So now you can connect to http://localhost:8001 and https://localhost:8002 to test your application using https.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Testing HTTPS with Django's Development Server
  • Minimum cost for warming-up various frameworks(and more)

    My good friend Takashi Matsuo wrote an interesting blog about start up times of various frameworks on appengine. Because appengine kills your server process it often needs to load your application into memory from scratch. This can take a lot of time if a lot of modules are loaded.

    http://takashi-matsuo.blogspot.com/2009/10/minimum-cost-of-various-frameworks-cold.html

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Minimum cost for warming-up various frameworks(and more)
  • Testing using a mocked HTTP server

    Recently I got some tests working for my django-lifestream project. The lifestream imports data from RSS/Atom feeds so there isn't a good way to run tests without creating a test HTTP server to serve up your RSS/Atom.

    The tests start up an http server in a separate thread which serves rss/atom/xml files from a set test directory. I copied the test http server which was used for feedparser's tests. The code is entirely unreadable but the important thing that it does is read information about what to supply in response headers from the xml file (This is useful for testing different scenarios where encoding in the response header is different from the encoding in the xml file etc.)

    In order to get it to work I had to do a bit of threaded programming which I'm pretty new to in python. In order to have main thread running the tests wait until the server was started properly I used a Condition object from the threading library. The condition provides a way to maintain a lock and notify another thread to stop waiting.

    In order for various tests to use this functionality I created a base test class. It looks something like this:

    #!/usr/bin/env python
    #:coding=utf-8:
    
    import urllib
    import threading
    import logging
    
    from django.test import TransactionTestCase as DjangoTestCase
    
    from testserver import PORT,FeedParserTestServer,stop_server
    
    class BaseTest(DjangoTestCase):
        base_url = "http://127.0.0.1:%s/%s"
    
        def setUp(self):
            # Disable logging to the console
            logging.disable(logging.CRITICAL+1)
    
            self.cond = threading.Condition()
            self.server = FeedParserTestServer(self.cond)
            self.cond.acquire()
            self.server.start()
    
            # Wait until the server is ready
            while not self.server.ready:
                # Collect left over servers so they release their
                # sockets
                import gc
                gc.collect()
                self.cond.wait()
    
            self.cond.release()
    
        def get_url(self, path):
            return self.base_url % (PORT, path)
    
        def tearDown(self):
            self.server = None
            stop_server(PORT)
    

    The server thread takes the condition object and starts the mock webserver.

    class FeedParserTestServer(Thread):
        """HTTP Server that runs in a thread and handles a predetermined number of requests"""
        TIMEOUT=10
    
        def __init__(self, cond=None):
            Thread.__init__(self)
            self.ready = False
            self.cond = cond
    
        def run(self):
            self.cond.acquire()
            timeout=0
            self.httpd = None
            while self.httpd is None:
                try:
                    self.httpd = StoppableHttpServer(('', PORT), FeedParserTestRequestHandler)
                except Exception, e:
                    import socket,errno,time
                    if isinstance(e, socket.error) and errno.errorcode[e.args[0]] == 'EADDRINUSE' and timeout < self.TIMEOUT:
                        timeout+=1
                        time.sleep(1)
                    else:
                        self.cond.notifyAll()
                        self.cond.release()
                        self.ready = True
                        raise e
            self.ready = True
            if self.cond:
                self.cond.notifyAll()
                self.cond.release()
            self.httpd.serve_forever()
    

    The important part with conditions is that both threads need to call the acquire() method in order for blocking to occur. I kind of got confused when one thread said that I hadn't aquired the condition when I had done so already in another thread. It's important that both threads attempt to acquire the lock.

    So thread 1, the main thread, acquires the lock and starts thread 2 which also acquires the lock. This doesn't block right away as it would block forever. Instead thread 1 calls wait() and blocks until notified. Thread 2 attempts to start the HTTP server and when finished calls notifyAll() which notifies thread 1 to stop waiting and continue with testing.

    Because this method starts a server in the setUp() method and stops it in the tearDown() method a new thread and server is started for each test in each TestCase that extends the BaseTest. Because socket connections don't release their port until they are garbage collected there is a little bit in there to get the garbage collector to do it's thing so we can start up the next server on the same port. Also we have a timeout in thread two which causes it to try to start the server a number of times before giving up.

    In order to stop the server in the tearDown() I used a stoppable HTTP server that implements the QUIT HTTP method that tells the server to stop.

    class FeedParserTestRequestHandler(SimpleHTTPRequestHandler):
        # Some other stuff here ...
    
        def do_QUIT(self):
            """send 200 OK response, and set server.stop to True"""
            self.send_response(200)
            self.end_headers()
            self.server.stop = True
    
    class StoppableHttpServer(HTTPServer):
        """http server that reacts to self.stop flag"""
    
        def serve_forever (self):
            """Handle one request at a time until stopped."""
            self.stop = False
            while not self.stop:
                self.handle_request()
    
    def stop_server(port):
        """send QUIT request to http server running on localhost:<port>"""
        conn = httplib.HTTPConnection("127.0.0.1:%d" % port)
        conn.request("QUIT", "/")
        conn.getresponse()
    

    The do_QUIT method is executed when the QUIT HTTP method is sent to the server. The stop_server function makes a QUIT message to the server to stop it.

    There you have it. This code seems to work in Linux but I'm not sure if it is very portable code. If someone wants to give it a try and let me know the results I'd be eternally grateful.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Testing using a mocked HTTP server
  • Annoying things about Django

    Since I've been using it for a while now I've gotten a good idea about what is good and what is annoying about development with django. This might seem a little trite at parts since some of these gripes are with features that don't exist in other frameworks but in the spirit of perhaps making django more flexable without ruining it's ease of use I've come up with some annoying spots and possible ideas for fixing them.

    New Password Request Time Limit

    The new password request time limit can only be specified in days. This makes it impossible to use if you want to shorten the new password request time limit to less than 1 day.

    Admin Users / Site users

    This is an issue that largely has to do with winning people over rather than any kind of technical problem but generally people see admin users and users of the site differently. This means they get really scared when they log into the admin and they are automatically logged into the site and vise-versa.

    "You mean if you accidentally flip the staff flag (and allow access to the admin from internet ips) then a regular user could just log into the admin?". And I have to answer: Yup. This for some reason scares the shit out of people. I even have other developers who this scares let alone managers for websites and customers. They see admins and site users as completely different identities and they expect the login to the admin and the login to the site to be completely separate.

    I know this really messes up django's design and application philosophy. For instance, with the contrib.auth module you couldn't be logged into the admin and the site unless you could specify a session cookie on a per application basis. But isn't there something that could be done? I'm forced by other folks to basically reimplement the auth module for every project because there is only one users table.

    GenericForeignKeys Look Awful in the Admin.

    Generic foreign keys actually consist of two fields, a link to the content type and the actual key. But these look like crap in the admin because they are rendered separately and are shown as a dropdown for the content type and a integer field. It would be nicer if you could specify which kinds of models were possible as content type targets and provide a better widget for them.

    send_mail Uses the DEFAULT_ENCODING Setting

    We have had to subclass the EmailMessage object with our own message format to get it to work with encodings other than the DEFAULT_ENCODING. Email headers are also always encoded in utf8 regardless of the DEFAULT_ENCODING. This is especially annoying when sending email to Japanese cellphones, which sometimes expect iso-2022-jp or ShiftJIS rather than whatever your DEFAULT_ENCODING is. And changing the DEFAULT_ENCODING just because you have to send some one off emails is out of the question.

    Can't Filter an Admin List via a JOIN. eg. user__group

    It would be cool if you could filter an admin list via a join. Something like:

    class MyAdmin(admin.ModelAdmin):
        list_display    = ('name','user')
        list_filter     = ('user__group',)
        model = MyModel
    

    That way you could filter MyModel records by group.

    There is no Good Way to Simply View Data/Fields in the Admin

    In the admin there is no good way to have view/read only fields on a form. There are many instances where you would like to show data in the admin but not allow it to be edited. Create time fields come to mind. Right now there are workarounds which show the field and add it as a hidden field in the form. This mostly works but is a crappy workaround and potential security problem if you have people using the admin who aren't fully trusted.

    Applications Don't Have Their Own Settings.

    This means that other applications can't easily add settings (types etc.) to other applications. This results in situations like django-notifications where notification types are stored in the database and need to be added via syncdb. django-notifications does this because they want other applications to be able to add notification types easily. They could do that by providing their own settings for things like notifications.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Annoying things about Django
  • Custom Admin Views and Reversing Django Admin URLs

    I recently used the new feature in Django 1.1 for reversing django admin urls and specifying custom admin views in my project django-lifestream.

    django-lifestream has a custom admin view which allows users to update the lifestream manually. The code looks like the following:

    class ItemAdmin(admin.ModelAdmin):
        list_display    = ('title', 'date','published')
        exclude         = ['clean_content',]
        list_filter     = ('feed',)
        search_fields   = ('title','clean_content')
        list_per_page   = 20
    
        model = Item
    
        def save_model(self, request, obj, form, change):
            obj.clean_content = strip_tags(obj.content)
            obj.save()
    
        def admin_update_feeds(self, request):
            from lifestream.feeds import update_feeds
            #TODO: Add better error handling
            update_feeds()
            return HttpResponseRedirect(
                    reverse("admin:lifestream_item_changelist")
            )
    
        def get_urls(self):
            from django.conf.urls.defaults import *
            urls = super(ItemAdmin, self).get_urls()
            my_urls = patterns('',
                url(
                    r'update_feeds',
                    self.admin_site.admin_view(self.admin_update_feeds),
                    name='admin_update_feeds',
                ),
            )
            return my_urls + urls
    
    admin.site.register(Item, ItemAdmin)
    

    The key parts are the get_urls function and the admin_update_feeds view. The get_urls method returns the urls for this admin to which we are adding our custom view. The custom view does the updating of the lifestream feeds and returns the user to the Item model's changelist view. We get the url for that view by calling reverse with the pattern "<namespace>:<app>_<model>_changelist" which in our case is "admin:lifestream_item_changelist" since the django admin uses the admin namespace.

    I created the button for updating the feeds by overriding the default admin template with my own subclassed template. The template like the following:

    {% extends "admin/change_list.html" %}
    {% load adminmedia admin_list i18n %}
    
    {% block object-tools %}
    {% if has_add_permission %}
    <ul class="object-tools">
      <li><a href="{% url admin:admin_update_feeds %}">{% blocktrans with cl.opts.verbose_name|escape as name %}Update Items{% endblocktrans %}</a></li>
      <li><a href="add/{% if is_popup %}?_popup=1{% endif %}" class="addlink">{% blocktrans with cl.opts.verbose_name|escape as name %}Add {{ name }}{% endblocktrans %}</a></li>
    </ul>
    {% endif %}
    {% endblock %}
    

    Here I'm getting the url for my custom admin view with the code {% url admin:admin_update_feeds %}, "admin_update_feeds" being the name I supplied in the get_urls method above.

    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Custom Admin Views and Reversing Django Admin URLs
  • Django and nginx settings

    One problem I keep encountering with setting up fastcgi with Django is that the default nginx fastcgi parameters cause django to load the top url no matter what url you try to go to. This is because the default nginx fastcgi parameters pass the SCRIPT_NAME parameter to the django instance which Django interprets incorrectly. In order to fix this you need to rename the SCRIPT_NAME parameter to PATH_INFO.

    fastcgi_param PATH_INFO $fastcgi_script_name;
    fastcgi_param REQUEST_METHOD $request_method;
    fastcgi_param QUERY_STRING $query_string;
    fastcgi_param CONTENT_TYPE $content_type;
    fastcgi_param CONTENT_LENGTH $content_length;
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Django and nginx settings
  • Google Appengine SDK 1.2.3

    The Google Appengine SDK 1.2.3 was just released and contains some often asked for goodies such as Django 1.0 support and support for a task queue API.

    I haven't found much information about the Django 1.0 version in Appengine but here are some links with some related information about the Task Queue API.

    The code looks something like the code below. You tell the task queue that you have some work to do later and which url the worker is located at. The worker is then called via a Web Hook post request with the parameters you gave it. The request is limited to 30 seconds like most requests. It will continue retry the work until it gets a 200 OK response (That isn't to say that you should just return a 500 HTTP status if your worker cannot complete in time. If you have more work your worker should add itself back to the queue and return 200 OK).

    Tasks are executed as soon as possible and only if there is work so it's quite a bit different from the cron support which runs every so often regardless of whether there is work or not. Based on the demo from Google I/O it runs faster than normal requests so you might even have some work finished before the request that added the work to the task queue finishes and gets back to your browser!

    import wsgiref.handlers
    from google.appengine.api.labs import taskqueue
    from google.appengine.ext import db
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp import template
    
    class Counter(db.Model):
      count = db.IntegerProperty(indexed=False)
    
    class CounterHandler(webapp.RequestHandler):
      def get(self):
        self.response.out.write(template.render('counters.html',
                                                {'counters': Counter.all()}))
    
      def post(self):
        key = self.request.get('key')
    
        # Add the task to the default queue.
        taskqueue.add(url='/worker', params={'key': key})
    
        self.redirect('/')
    
    class CounterWorker(webapp.RequestHandler):
      def post(self):
        key = self.request.get('key')
        def txn():
          counter = Counter.get_by_key_name(key)
          if counter is None:
            counter = Counter(key_name=key, count=1)
          else:
            counter.count += 1
          counter.put()
        db.run_in_transaction(txn)
    
    def main():
      wsgiref.handlers.CGIHandler().run(webapp.WSGIApplication([
        ('/', CounterHandler),
        ('/worker', CounterWorker),
      ]))
    
    if __name__ == '__main__':
      main()
    
    Send feedback   このエントリーを含むはてなブックマーク はてなブックマーク - Google Appengine SDK 1.2.3