Ian Lewis
Ian Lewis is a web developer living in Tokyo Japan. His current interests are in Django, python, alternative databases and rapid web application development. About Me...
  • Writing Schema migrations for Appengine using the Mapper Class and the deferred Library

    One thing that many people using appengine know is that writing schema migrations is hard. Improving performance on Appengine often revolves around getting objects by key or key name rather than using filters, however altering the makeup of an objects key requires pulling all the objects and saving them in the datastore anew. This also requires modifying the ReferenceProperties of any objects pointing to your changed object. On top of that, schema migrations generally require modifying lots of data and you have limits on the number of objects returned by a filter, and request timeouts to worry about.

    Fortunately, the Appengine SDK provides a task queue and a very convenient way of using it in the deferred library. The deferred library allows you to set a function to be run by the task queue in the background. This coupled with the Mapper class provided in the article make for a powerful way to process large amounts of data in a safe way. Unfortunately, there are a couple bugs with in the Mapper class provided in the article. It's missing a couple imports, doesn't save data properly and throws errors when there is no data to be processed. I have provided an updated version of the Mapper class here.

    from google.appengine.ext import db
    
    from google.appengine.ext import deferred
    from google.appengine.runtime import DeadlineExceededError
    
    class Mapper(object):
        # Subclasses should replace this with a model class (eg, model.Person).
        KIND = None
    
        # Subclasses can replace this with a list of (property, value) tuples to filter by.
        FILTERS = []
    
        def __init__(self):
            self.to_put = []
            self.to_delete = []
    
        def map(self, entity):
            """Updates a single entity.
    
            Implementers should return a tuple containing two iterables (to_update, to_delete).
            """
            return ([], [])
    
        def finish(self):
            """Called when the mapper has finished, to allow for any final work to be done."""
            self._batch_write()
    
        def get_query(self):
            """Returns a query over the specified kind, with any appropriate filters applied."""
            q = self.KIND.all()
            for prop, value in self.FILTERS:
                q.filter("%s =" % prop, value)
            q.order("__key__")
            return q
    
        def run(self, batch_size=100):
            """Starts the mapper running."""
            self._continue(None, batch_size)
    
        def _batch_write(self):
            """Writes updates and deletes entities in a batch."""
            if self.to_put:
                db.put(self.to_put)
                self.to_put = []
            if self.to_delete:
                db.delete(self.to_delete)
                self.to_delete = []
    
        def _continue(self, start_key, batch_size):
            q = self.get_query()
            # If we're resuming, pick up where we left off last time.
            if start_key:
                q.filter("__key__ >", start_key)
            # Keep updating records until we run out of time.
            try:
                # Steps over the results, returning each entity and its index.
                i = None
                for i, entity in enumerate(q):
                    map_updates, map_deletes = self.map(entity)
                    self.to_put.extend(map_updates)
                    self.to_delete.extend(map_deletes)
                # Do updates and deletes in batches.
                if i is not None and (i + 1) % batch_size == 0:
                    self._batch_write()
                # Record the last entity we processed.
                    start_key = entity.key()
            except DeadlineExceededError:
                # Write any unfinished updates to the datastore.
                self._batch_write()
                # Queue a new task to pick up where we left off.
                deferred.defer(self._continue, start_key, batch_size)
                return
            self.finish()
    

    The Mapper class processes all object by default but you can add filters using the FILTERS property to only select certain objects. Creating a Mapper class is easy, you just implement the map() method (and optionally override the finish method) and return a two tuple containing a list of objects to update/create and a list of objects to delete. These objects are then saved in batch automatically by the Mapper class.

    Lets create a simple Mapper implementation to update the schema for a Model.

    from google.appengine.ext import deferred
    
    from mapper import Mapper
    from mymod import MyModel
    
    class MyModelMapper(Mapper):
        KIND = MyModel
    
        def map(self, entity):
            if entity.key().name():
                return ([], [])
    
            new_entity = MyModel(
                key_name = str(entity.key().id()),
                value = entity.value,
            )
    
            return ([new_entity], [entity])
    
    def run_migration():
        m = MyModelMapper()
        deferred.defer(m.run)
    

    This mapper migrates the data for of the MyModel type to using key names instead of numeric ids. Of course if any other objects referred to your MyModel objects you would need to alter those too but this demonstrates some of the things you can to with the Mapper class. Here you would just need to run the run_migration() method and it would add the mapper to the task queue to be run in the background.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Writing Schema migrations for Appengine using the Mapper Class and the deferred Library
  • Minimum cost for warming-up various frameworks(and more)

    My good friend Takashi Matsuo wrote an interesting blog about start up times of various frameworks on appengine. Because appengine kills your server process it often needs to load your application into memory from scratch. This can take a lot of time if a lot of modules are loaded.

    http://takashi-matsuo.blogspot.com/2009/10/minimum-cost-of-various-frameworks-cold.html

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Minimum cost for warming-up various frameworks(and more)
  • Smipple

    http://www.smipple.net/static/img/smipple_header.png

    Yesterday I released a pet project I had been working on called Smipple. Smipple is a service for saving, organizing, and sharing snippets of code. I originally decided to create it because I was a user of Snipplr but I was frustrated because it was slow and hard to use and the XML-RPC api was buggy. There didn't seem to be much response from the author or changes to the website either.

    So from there I decided that I would create it as a challenge because I had wanted to create an actual website that people could use including implementation and marketing (there's no point to creating it if people don't use it) on appengine. I thought that there weren't many sites utilizing appengine that were used broadly and wanted to try and create one.

    Smipple is the result of about of about two months of solid part-time development in my free time streched over about 6 months. Much of that was attempting to design the website myself, eventually giving up and having a proper designer design the site, and reintegrating the new design. This was also my first real appengine project so there were many things I had to learn along the way such as how to denormalize the data but at the same time be able to keep it in a somewhat consistent state in the case of failures. Dealing with how to save the the social network and create the dashboard were also interesting. I'll talk about these in later blog posts.

    Smipple was originally conceptualized as a social code sharing site that would utilize Open-Social application with my friend Takashi Matsuo. But it became hard to visualize how we would integrate users from different networks and whether what I wanted to achieve could really be done with Open Social. After that an Open Social application was put on hold and the site itself was created which would eventually allow users from various sites by virtue of Smipple having it's own social network and importing their contacts from existing sites (a feature that wasn't actually finished at launch).

    Smipple is still missing many features that I thought needed to be on the site but I had already taken too much time with it and wanted to release it early to get user feedback. Wasting more time on what I thought was important wasn't an option anymore. So far that has worked out I think as there is some good feedback on Smipple's feedback forum. Smipple so far has recieved a fair amount of criticism for it's lack of features but I hope to resolve those quickly as I know what features people are wanting and what priority to attach to them.

    I'll be updating Smipple often as it has been pretty exciting to get feedback about the service. I didn't really put a "beta" label on Smipple but it can certainly be though of as "beta" in terms of number of features and how much work needs to be done on the site. I hope you stick around as Smipple grows.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Smipple
  • Google Appengine SDK 1.2.3

    The Google Appengine SDK 1.2.3 was just released and contains some often asked for goodies such as Django 1.0 support and support for a task queue API.

    I haven't found much information about the Django 1.0 version in Appengine but here are some links with some related information about the Task Queue API.

    The code looks something like the code below. You tell the task queue that you have some work to do later and which url the worker is located at. The worker is then called via a Web Hook post request with the parameters you gave it. The request is limited to 30 seconds like most requests. It will continue retry the work until it gets a 200 OK response (That isn't to say that you should just return a 500 HTTP status if your worker cannot complete in time. If you have more work your worker should add itself back to the queue and return 200 OK).

    Tasks are executed as soon as possible and only if there is work so it's quite a bit different from the cron support which runs every so often regardless of whether there is work or not. Based on the demo from Google I/O it runs faster than normal requests so you might even have some work finished before the request that added the work to the task queue finishes and gets back to your browser!

    import wsgiref.handlers
    from google.appengine.api.labs import taskqueue
    from google.appengine.ext import db
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp import template
    
    class Counter(db.Model):
      count = db.IntegerProperty(indexed=False)
    
    class CounterHandler(webapp.RequestHandler):
      def get(self):
        self.response.out.write(template.render('counters.html',
                                                {'counters': Counter.all()}))
    
      def post(self):
        key = self.request.get('key')
    
        # Add the task to the default queue.
        taskqueue.add(url='/worker', params={'key': key})
    
        self.redirect('/')
    
    class CounterWorker(webapp.RequestHandler):
      def post(self):
        key = self.request.get('key')
        def txn():
          counter = Counter.get_by_key_name(key)
          if counter is None:
            counter = Counter(key_name=key, count=1)
          else:
            counter.count += 1
          counter.put()
        db.run_in_transaction(txn)
    
    def main():
      wsgiref.handlers.CGIHandler().run(webapp.WSGIApplication([
        ('/', CounterHandler),
        ('/worker', CounterWorker),
      ]))
    
    if __name__ == '__main__':
      main()
    
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Google Appengine SDK 1.2.3
  • Google IO 2009

    This year I attended Google IO and had so much fun that I think I'll have to break it up into several blog posts. Google IO is held in San Francisco and is the #1 Google event of the year. About 4000 or so developers attended the event which was held in the Moscone West Convention Center. In this post I'll kind of give the history of the events leading up to the event and some context.

    One day in late January or so, Google announced that registration for this year's Google IO was open and there was a subsequent conversation on twitter that ensued which led to id:tmatsuo, id:a2c, id:voluntas, myself, and several others proclaiming they were going to attend this year's event.

    I found out that the office of one of my old classmates, Bob Ippolito, is close to the event so I sort of jokingly invited myself and id:tmatsuo to stay at his office. You might know Bob as the co-founder and CTO of a company called MochiMedia, or as the author of numerous pieces of well used free software such as simplejson, MochiWeb, MochiKit, or PyObjC or as the coiner of the phrase JSONP (Note the date of the blog post). He has consistently gotten into things before they become big, including technologies such as json, erlang, Non-RDBM databases and often gives talks about them which essentially give you key insights into the future of web programming.

    ded by offering to let me and id:tmatsuo stay at his apartment. However, when it came close to the event Bob realized that he was going to be at another conference for Flash developers in Boston, which meant that we would only have one day where we were both in San Francisco. Bummer! But he let us stay at his house even though he wasn't going to be there saving us a lot of money. He was very gracious.

    Me and Bob

    Unfortunately, a few folks such as id:voluntas couldn't go to the event because of overreactions by their employers to the outbreak of swine flu but almost everyone that did go arrived in San Francisco on May 26th. I'll continue in a later blog post!!

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Google IO 2009
  • Transactions on Appengine

    The way to store data on Appengine is with Google's BigTable Datastore which has support for transactions. However, the transactions are quite limited in that,

    1. You can only execute callables inside transactions. Which means you basically call run_in_transaction() on a function. This can sometimes be a pain but can generally be worked around with decorators and the like.
      def my_update_function():
        # Some update code here
        ent.put()

      run_in_transaction(my_update_function)
    2. You can only update entities in the same entity group. This means all entities must be in the same ancestor tree. This can make updating entities with various relationships hard or impossible to do in a general way in a transaction.
    3. You cannot do filters in a transaction. This means you cannot do any kind of select, period. This means you cannot do the following:
      class ModelA(db.Model):
        pass

      class ModelB(db.Model):
        modela = ReferenceProperty(ModelA)

      def update_func():
        # Sorry this won't work
        modelas = ModelA.all()

        # This is the only thing that works
        modela = ModelA.get_by_id(123)

        # Jeez, you can't do this either!
        modelb = ModelB.filter('modela =', modela)
      You can only do gets based on the key of an entity. Which means if you have a relationship like the one above you need to be able to derive the key to ModelB given the key for ModelA. And since you cannot chose numeric keys with which to save entities (numeric keys are always assigned), you will need to assign key names for both entities.

    All this makes transactions a bit of a pain in Appengine but workable if you put a bit of effort into it. In the end you'll want to use key names for most every entity that matters as current backup solutions for Appengine rely on key names to maintain the keys of entities when backing up and restoring. It wouldn't be to fun if all the urls for an entity that had numeric ids changed after restoring the data from a backup.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Transactions on Appengine
  • Jaiku on Appengine

    Yesterday Google's Twitter-like service, Jaiku was released as open source running on Google Appengine so I decided to take it for a spin. It has a lot of neat parts like XMPP and google contacts integration, but what I'm interested in most is how it implements it's publisher/subscriber model.

    I brought the code down from svn and tried to follow the instructions, but I got a "No module named django" error. One of the problems currently with appengine is that you have a limit of 1000 files you can upload. Because of this limit when deploying jaiku you need to zip a bunch of libraries into a zip file and use zipimport. Accordingly you have to prevent the source files from being uploaded because you get an error saying you can't upload more than 1000 files.

    The problem there is that the newest (1.1.9) SDK prevents you from loading modules and/or accessing files that are specified in the skip-files directive in your app.yaml. This prevented me from importing django because it's a zipped module.

    At first I tried just zipping the files up using the zip_all command in the Makefile (make zip_all) but I still got the same error so I just commented out the relevant parts in app.yaml.

    skip_files: |
     ^(.*/)?(
     (app\.yaml)|
     (app\.yml)|
     (index\.yaml)|
     (index\.yml)|
     (#.*#)|
     (.*~)|
     (.*\.py[co])|
     (.*/RCS/.*)|
     # (\..*)|
     # (manage.py)|
     # (google_appengine.*)|
     # (simplejson/.*)|
     # (gdata/.*)|
     # (atom/.*)|
     # (tlslite/.*)|
     # (oauth/.*)|
     # (beautifulsoup/.*)|
     # (django/.*)|
     # (docutils/.*)|
     # (epydoc/.*)|
     # (appengine_django/management/commands/.*)|
     # (README)|
     # (CHANGELOG)|
     # (Makefile)|
     # (bin/.*)|
     # (images/ads/.*)|
     # (images/ext/.*)|
     # (wsgiref/.*)|
     # (elementtree/.*)|
     # (doc/.*)|
     # (profiling/.*)
     )$

    From there it should have worked but I got an error about the pstats module. That just happened to not be installed on my machine so installed python-profiler and Jaiku ran from there.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Jaiku on Appengine
  • Django

    I was thinking about using Django for one of my projects on GAE because it seems like a popular project and somewhat easy to use, but I'm not quite understanding yet why it's better to have helper functions rather than controller/handler classes like Pylons or GAE's normal WSGI handling has. With handler classes my controller might look like:

    from google.appengine.ext.webapp import RequestHandler

    class MainHandler(webapp.RequestHandler):
      def get(self):
        # Read data from BigTable here
        self.response.out.write(outputhtml)

      def post(self):
        # Write data to BigTable here

        #redirect back to the url
        self.redirect(self.request.url)

    Whereas the django helper function might look like

    from django.http import HttpResponse, HttpResponseRedirect

    def mainview(request):
      if request.method == 'POST':
        # Write to BigTable Here
        return HttpResponse(outputhtml)
      elif request.method == 'GET':
        # Read from BigTable Here
        return HTTPResponseRedirect(request.url)

    While the Django method might have the potential to have be a bit less verbose it feels like it would be harder to do things correctly, like factor code etc. I also don't really like the conditional checks to see what kind of HTTP method was used. So either I would need to split GETs and POSTs to separate urls or just live with the conditional checks.

    Personally I feel better with the Pylons-ish controller/handler approach. Anyone have an opinion?

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Django
  • Google Developer Day 2008

    I went to Google Developer Day 2008 in Yokohama Japan yesterday. The keynote speech was pretty much the exact same info as was given at the keynote at Google I/O where Google announced their direction, moving forward the web as a platform.

    Keynote

    As with the Google I/O keynote it was mentioned how Google feels that Computing power and accessability have kind of flip-flopped over the years. In the mainframe era you had computing power but no accessability, in the PC era you had accessability but lost relative computing power, and now in the web era we are getting back computing power in the form of cloud computing but we are loosing accessablity to those resources. They plan on fixing this with the, so called, three Cs. Client, Connectablity and Cloud.

    The first refers to the browser, so Google wants to make the browser richer in order to give us accessability to the computing power that they can provide. They are doing this with Google Gears and some other handy browser plugins.

    Connectability refers to allowing everyone equal access to resources and making sure every one can connect. This means making sure that internet lines are fast, airwaves are open etc. They see mobiles as big in the future so they hope to help the connectability problem with Android, their free, open operating system for mobiles.

    Cloud refers to their vast data centers. They hope to give access to these resources through products like Appengine where developers can access the vast resources and scalability that Google's data centers provide.

    Appengine Hackathon

    In the afternoon I attended the Appengine Hackathon which was presided over by Brett Slatkin, who is none other than the guy in the Appengine demo video. It was interesting because from the e-mails I recieved about the event, I figured it would be in Japanese but it ended up being entirely in English. Many of the Japanese folks had trouble following along so I tried to help where I could.

    In the beginning, Brett talked about Appengine and used an example wiki as a demo app. Then we went into coding our projects. At the end some folks showed off their applications. Despite the language barriers many folks came up with some really original, and cool ideas. The first was created by a Google engineer, who said he would set the bar low but ended up with one of the better applications. His app read calandar events from RSS and allowed users to add comments to it. He also implemented memcache support. There was an application with the idea to attach pictures based on the hostility/mood level of a chat message or Twitter tweet. There was a social bookmarking app, and an app to allow live translating of a django application.

    For what it's worth I presented my application which I hope to make into a workable form application builder. I haven't uploaded it yet so you'll have to make do with my first Appengine application, a prefix calculator with a simple rest api.

    Dinner

    Afterwards I went out to dinner with a number of folks who participated in the Hackathon. It turned out to be a lot of fun and I made a lot of new friends many of whom are now in my twitter contacts ;) All in all a hugely satisfying experience.

    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Google Developer Day 2008
  • Google Developer Day 2008

    Google Developer Day Japan 2008 is being held on June 10th at Google's offices in Shibuya and I've registered to attend this year. There were a number of sessions that people could take part in but I decided to register for a Google appengine hackathon. I'm pretty curious about appengine since I've been working at becoming more familiar with really newly evolving technologies and not necessarily ones that have been around a while. Newly evolving technologies is something I've always felt I've had to catch up on since starting programming in high school. Going to high school with folks like Bob Ippolito (Mochikit, simplejson) and Konrad Rokicki who started coding stuff when they were in early middle school didn't help my self esteem.

    Anyway, in the spirit of learning about Appengine I took a dive into the documentation and learned a few of appengines silly limitations but I came up with a simple application that utilizes the simple python library I created for prefix back in college. I put it up in my mercurial repository under prefix-appengine if you care to take a look.

    The main work is done in two handlers which are essentially the controller part of the MVC pattern. One simply renders the page as a template, which is really simple since there isn't any template code, and the other implements a simple rest API that I use for an AJAX call to evaluate an expression given by the user. Using JSON seemed like a waste since there was only one returned value.

    class PrefixHandler(webapp.RequestHandler):
        def get(self):
            self.response.out.write(template.render("main.tpl", {}))
       
        # def post(self):
        #     self.redirect('/')

    class EvalHandler(webapp.RequestHandler):
        def get(self):
            expression = self.request.get("exp")
            values = {}
            try:
                output = prefix.parser.parse(expression).evaluate()
                values = {
                    "value": output
                }
            except ValueError, arg:
                output = "ERROR: " + str(arg)
                values = {
                    "error": output
                }
            self.response.out.write(simplejson.dumps(values))

    The rest of the code is in the javascript which I just wrote strait into the template file because I was lazy. The javascript uses jquery to do an AJAX call when the button is pressed and update the HTML DOM.

    var lastvalue = "";

    $(document).ready(function() {
      $("#eval").click(function() {
        expression = $("#exp").val();
        $("#output").html("Loading..");
        uri = "eval?exp=";
        uri += encodeURIComponent(expression.replace("Ans", lastvalue));
        uri = uri.replace(/%20/g|>, '+');
        $.getJSON(uri,
          // Callback
          function (data) {
            output = "<font color='#FF0000'>ERROR: Invalid response from server</font>";
            if (data.value != null) {
              output = expression + " = <font color='#00FF00'>" + data.value + "</font>";
              lastvalue = data.value;
            } else {
              if (data.error &amp;&amp; data.error.length>0) {
                output = "<font color='#FF0000'>"+ data.error +"</font>";
              }
            }
            $("#output").html(output);
          }
        );
      });
    });
    Send feedback このエントリーを含むはてなブックマーク はてなブックマーク - Google Developer Day 2008