Saturday, December 27, 2008

Ajax with Python - Combining PyJS and Appengine with json-rpc

I recently (re)discovered pyjs  - also called Pyjamas - which  is a tool to support development of (client-side) Ajax applications with Python, it does that by compiling Python code to Javascript (Pyjamas is inspired by GWT  - which supports writing Ajax applications in Java).

pyjs' kitchensink comes with a JSON-RPC example, this posting shows how to use Appengine to serve as a JSON-RPC server for the pyjs json-rpc example.

(Rough) Steps:
1) download appengine SDK and create an appengine application (in the dashboard)
2) download pyjs 
3) Replace pyjs example/jsonrpc/JSONRPCExample.py with this code

from ui import RootPanel, TextArea, Label, Button, HTML, VerticalPanel, HorizontalPanel, ListBox


from JSONService import JSONProxy


class JSONRPCExample:
    def onModuleLoad(self):
        self.TEXT_WAITING = "Waiting for response..."
        self.TEXT_ERROR = "Server Error"
        self.remote_py = UpperServicePython()
        self.status=Label()
        self.text_area = TextArea()
        self.text_area.setText(r"Please uppercase this string")
        self.text_area.setCharacterWidth(80)
        self.text_area.setVisibleLines(8)
        self.button_py = Button("Send to Python Service", self)
        buttons = HorizontalPanel()
        buttons.add(self.button_py)
        buttons.setSpacing(8)
        info = r'This example demonstrates the calling of appengine upper(case) method with JSON-RPC from javascript (i.e. Python code compiled with pyjs to javascript).'
        panel = VerticalPanel()
        panel.add(HTML(info))
        panel.add(self.text_area)
        panel.add(buttons)
        panel.add(self.status)
        RootPanel().add(panel)


    def onClick(self, sender):
        self.status.setText(self.TEXT_WAITING)
        text = self.text_area.getText()
        if self.remote_py.upper(self.text_area.getText(), self) < 0:
            self.status.setText(self.TEXT_ERROR)


    def onRemoteResponse(self, response, request_info):
        self.status.setText(response)


    def onRemoteError(self, code, message, request_info):
        self.status.setText("Server Error or Invalid Response: ERROR " + code + " - " + message)


class UpperServicePython(JSONProxy):
    def __init__(self):
        JSONProxy.__init__(self, "/json", ["upper"])



4) Use the following code for the appengine app
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import logging
from django.utils import simplejson

class JSONHandler(webapp.RequestHandler):
  def json_upper(self,args):
    return [args[0].upper()]

  def post(self):
    args = simplejson.loads(self.request.body)
    json_func = getattr(self, 'json_%s' % args[u"method"])
    json_params = args[u"params"]
    json_method_id = args[u"id"]
    result = json_func(json_params)
    # reuse args to send result back
    args.pop(u"method")
    args["result"] = result[0]
    args["error"] = None # IMPORTANT!!
    self.response.headers['Content-Type'] = 'application/json'
    self.response.set_status(200)
    self.response.out.write(simplejson.dumps(args))

application = webapp.WSGIApplication(
                                     [('/json', JSONHandler)],
                                     debug=True)

def main():
  run_wsgi_app(application)

if __name__ == "__main__":
  main()
5) compile pyjs code in 3) and create static dir in appengine app to store compiled code (i.e in app.yaml)
6) use dev_appserver.py to test locally or appcfg.py to deploy on appengine

Facebook application
By using this recipe it was easy to create a facebook app of the example in this posting - check it out at http://apps.facebook.com/testulf/


Alternative backend - using webpy


This shows how to use webpy as backend, just put the javascript/html resulting from pyjs compile into the static/ directory. Very similar as the previous approach, with the code in blue being the main diff (i.e. webpy specific)
import web
import json
urls = (
'/json', 'jsonhandler'
)
app = web.application(urls, globals())

class jsonhandler:
  def json_upper(self,args):
    return [args[0].upper()]

  def json_markdown(self,args):
    return [args[0].lower()]

  def POST(self):
    args = json.loads(web.data())
    json_func = getattr(self, 'json_%s' % args[u"method"])
    json_params = args[u"params"]
    json_method_id = args[u"id"]
    result = json_func(json_params)
    # reuse args to send result back
    args.pop(u"method")
    args["result"] = result[0]
    args["error"] = None # IMPORTANT!!
    web.header("Content-Type","text/html; charset=utf-8")
    return json.dumps(args)

if __name__ == "__main__":
  app.run()

Wednesday, December 17, 2008

Simulating Mamma Mia under the xmas tree (with Python)

Norway has a population of ~4.7 million, and "Mamma Mia!" has during a few weeks been sold in ~600 thousand DVD/Blueray copies (> 12% of the population, i.e. breaking every sales record, perhaps with the exception of Sissel Kyrkjebø's Christmas Carol album which has sold a total of ~900 thousand copies in the same population, but that was over a period of 21 years).  In the UK it has sold more than 5 million (in a population of 59 million, i.e. >8% of the population).

Well, to the point. Guesstimating that there will be ~2 million xmas trees in Norway, one can assume that many of the trees will have (much) more than one "Mamma Mia!" dvd/blueray underneath it*,  the question is how many?

Simulating Mamma Mia under the xmas tree
.. Before you start recapping probability theory, combinatorics, birthday paradox, multiplication of improbabilities and whatnot, how about finding an alternative solution using simulation

A simple simulation model could be to assume that for every Mamma Mia dvd/blueray copy there is a lottery where all trees participate,  and then finally (or incrementally) count how many copies each tree won.

A friend wrote a nice Python script to simulate this:

import random

NUM_XMAS_TREES = 2000000
NUM_MAMMA_MIAS = 600000

tree_supplies = {}
for mamma_mia in range(NUM_MAMMA_MIAS):
    winner_tree = random.randint(0, NUM_XMAS_TREES)
    tree_supplies[winner_tree] = tree_supplies.get(winner_tree,0) + 1

tree_stats = {}
for tree in tree_supplies:
    tree_stats[tree_supplies[tree]] = tree_stats.get(tree_supplies[tree], 0) + 1

print tree_stats

Results:

$ for k in `seq 1 10`; do echo -n "$k " ; python mammamia.py; done
1 {1: 443564, 2: 67181, 3: 6618, 4: 510, 5: 36}
2 {1: 444497, 2: 66811, 3: 6543, 4: 520, 5: 32, 6: 2}
3 {1: 444796, 2: 66376, 3: 6738, 4: 510, 5: 36, 6: 3}
4 {1: 444499, 2: 66750, 3: 6652, 4: 469, 5: 30, 6: 2, 7: 1}
5 {1: 444347, 2: 66717, 3: 6697, 4: 494, 5: 28, 6: 2}
6 {1: 444511, 2: 66389, 3: 6763, 4: 551, 5: 40, 6: 3}
7 {1: 443914, 2: 66755, 3: 6785, 4: 511, 5: 33, 6: 2}
8 {1: 444747, 2: 66558, 3: 6667, 4: 484, 5: 40}
9 {1: 444553, 2: 66703, 3: 6631, 4: 497, 5: 32}
10 {1: 443903, 2: 66853, 3: 6774, 4: 487, 5: 23, 6: 1}

Conclusion

So we see that in run 4 there was one xmas tree that according to the simulation model got 7(!) Mamma Mia DVD/Bluerays underneath it, but the overall simulation shows that 5 or 6 (at most) is probably more likely (assuming the model is right).

Regarding the simulation model, it is probably way too simplistic, i.e. not taking into account people buying mamma mia for themselves (or not as xmas gifts), a likely skewness in terms of number of gifts per christmas tree, interaction between buyers, etc. But it can with relatively simple manners be extended with code to make it more realistic. Check out Markov Chain Monte Carlo simulation for more info on how to create more realistic simulation models.

Wednesday, December 3, 2008

cinpy - or C in Python

cinpy is a tool where you can write C code in your Python code (with the help of ctypes - included in modern Python versions). When you execute your python program the C code is compiled on the fly using Tiny C Compiler (TCC).
In this posting I will describe:
1) installation and testing cinpy
2) a simple benchmark (c-in-py vs python)
3) compare performance with gcc (c-in-py vs gcc)
4) measure cinpy (on-the-fly) compilation time
5) how to dynamically change cinpy methods

1. How to install and try cinpy (note: also found in cinpy/tcc README files)
1) download, uncompress and compile TCC
./configure
make
make install
gcc -shared -Wl,-soname,libtcc.so -o libtcc.so libtcc.o

2) download, uncompress and try cinpy
cp ../tcc*/*.so .
python cinpy_test.py  # you may have to comment out or install psyco 

2. Sample performance results (on a x86 linux box):

python cinpy_test.py
Calculating fib(30)...
fibc : 1346269 time: 0.03495 s
fibpy: 1346269 time: 2.27871 s
Calculating for(1000000)...
forc : 1000000 time: 0.00342 s
forpy: 1000000 time: 0.32119 s

Using cinpy for fibc (Fibonacci) method was ~65 times faster than fibpy, and and cinpy for forc (loop) was ~93 times faster than forpy, not bad.

3. How does cinpy (compiled with tcc) compare to gcc performance?
Copying the C fib() method and calling it with main program
$ time fibgcc


fib(30) = 1346269

real    0m0.016s
user    0m0.020s
sys     0m0.000s


GCC gives roughly twice as fast code as cinpy/tcc (0.034/0.016). 




4. How long time does it take for tcc to on-the-fly compile cinpy methods?


#!/usr/bin/env python
# cinpy_compile_performance.py
import ctypes
import cinpy
import time
t=time.time
t0 = t()
fibc=cinpy.defc(
     "fib",
     ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int),
     """
     int fib(int x) {
       if (x<=1) return 1;
       return fib(x-1)+fib(x-2);
     }
     """)
t1 = t()
print "Calculating fib(30)..."
sc,rv_fibc,ec=t(),fibc(30),t()
print "fibc :",rv_fibc,"time: %6.5f s" % (ec-sc)
print "compilation time = %6.5f s" % (t1-t0)

python cinpy_compile_performance.py

Calculating fib(30)...
fibc: 1346269 time: 0.03346 s
compilation time: 0.00333 s


So compilation (and linking) time is about 3.3ms, which is reasonably good, not a lot of overhead! (note: TCC is benchmarked to compile, assemble and link 67MB of code in 2.27s)


5. "Hot-swap" replacement of cinpy code?
Let us assume you have a system with a "pareto" situation, i.e. 80-99% of the code doesn't need be fast (written in Python), but 1-20% need to be really high performance (and written in C using cinpy), and that you need to frequently change the high performing code, can that be done? Examples of such a system could be mobile (software) agents.


Sure, all you need to do is wrap your cinpy definition as a string and run exec on it, like this:

origmethod="""
fibc=cinpy.defc("fib",ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int),
          '''
          int fib(int x) {
              if (x<=1) return 1;
              return fib(x-1)+fib(x-2);
          }
          ''')
"""
# add an offset to the Fibonacci method
alternatemethod = origmethod.replace("+", "+998+")
print alternatemethod
# alternatemethod has replaces origmethod with exec(alternatemethod)
print fibc(2) # = 1000
Conclusion:
cinpy ain't bad.

Remark: depending on the problem you are solving (e.g. if it is primarily network IO bound and not CPU bound) , becoming 1-2 orders of magnitude faster (cinpy vs pure python) is probably fast enough (CPU wise), the doubling from GCC may not matter (since network IO wise Python performs quite alright, e.g. with Twisted ).