Tuesday, August 3, 2010

Meta-Programming in RPython

What is Restricted Python?

Its hard to find a good definition, mainly because there is no formal definition. RPython was born as a side effect of the PyPy interpreter needing a way to compile itself. Since then it has become a general purpose language, but apparently has not been written about much other than a few academic papers [1], [2]. Meta-programming is RPython most exciting feature, more on that below.

Language Restrictions/Definition:
1. Lists and Dicts must contain a compatible type.
(objects with a common base class are compatible)
2. Function arguments types must not be changed after the first call, each call must be consistent with the others in the ordering of types.
(this also applies to function returns)
3. Subclasses must redefine functions that interact with an attribute they have made a new type.*
4. String and char are different, any string of length 1 becomes a char. (some builtin functions accept only a char)
5. Tuples may have incompatible types, but can not be iterated over, yet. (the error messages suggest the iteration limitation is going away)
6. Only __init__ and __del__ are allowed, so no custom attribute access or operator overloading.
7. Globals are considered constants (use singletons for global changeable state)
8. No runtime definition of functions or classes.
9. Slicing can not use negative indexes except for [:-1]
10. Some builtin functions requires literals, like getattr, for example:

arg = 'x'
getattr( o, arg )

There are a few other rules and exceptions not listed above, but those are small and the RPython translator/compiler is going to give you clear errors when you violate them. Getting started with RPython it is easy to make a mistake somewhere and violate rules 1-3, the errors you will get from breaking them are a bit cryptic as well:

a. Exception: don't know how to simply render py object:
b. pypy.rpython.error.TyperError: don't know how to convert from to
c. AttributeError: no field 'll_newlist'
d. pypy.rpython.error.TyperError: reading attribute '__init__': no common base class for [type]

*Rule 3 is unusual and easy to overlook, it means we must be careful with functions defined in our base class that interact with attributes we have redefined in subclasses that are of a new type. In other words, a function defined in the base class expects its the attributes of self to always have those fixed types, but if the subclasses changes the type, the function breaks. In unrestricted Python we would normally want to put as many shared functions in the base class as possible to avoid code duplication, rule 3 turns this upside down. Consider the following where we use wrapper objects for generic types and note that we have put the unwrap function in each of the subclasses.

class Type1(object): pass
class Type2(object): pass

class Abstract(object):
def __init__(self): # be careful with init, do not try to define defaults for what will have different types in subclasses!
#self.x = 'never define what x is in the base class'
self.y = 'this is safe'
def failure(self): self.x = 1.0

class O(object):
def __init__(self, arg='string'): self.a = arg

class X( Abstract ):
def __init__(self):
self.x = Type1()
def unwrap(self): return self.x

class Y( Abstract ):
def __init__(self):
self.x = Type2()
def unwrap(self): return self.x

class Z( Abstract ):
def __init__(self):
self.x = 10.0
def unwrap(self): return self.x

class W( Abstract ):
def __init__(self):
self.x = O
def unwrap(self): return self.x
def new_instance(self, arg): return self.x(arg)

def test_subclasses_fails():
c = Abstract()
c.failure() # this breaks because the type can not be known from the base class
a = X()
b = Y()

def test_subclasses_works():
a = X()
b = Y()
c = Z()
lst = [a,b] # sharing the same base class means they can co-exist in a list
for item in lst: print item
print a.x
print b.x
print c.x
o = W().x()
print o
print a.unwrap()
print b.unwrap()
o2 = W().new_instance( 'hello world' )
print o2.a
print 'test shared complete'

Another case that can easily lead to rule breaking is when dealing with *args which is a tuple, and since tuples can store incompatibles, we may be tempted to use it to pass variable amounts of incompatible types; but they can not be iterated over, and casting args to a list will not work either because lists must contain only compatible types. Tuple indexing must also be done with a literal constant as well, so for loop will fail:
def func( *args ):
for i in range(len(args)):
a = args[i]

Above fails, so the only way to get at items in args is to index each one with a literal like this, not pretty!:
n = len(args)
if n > 1: args[0]
if n > 2: args[1]
if n > 3: args[2]


If we have to code that way, then it seems like we should give up trying to pass variable numbers of incompatible types. However, unlike many other languages which are parsed only from source code, RPython uses live objects and introspection; and before compiling we have the chance to generate classes, modify functions, create new globals, and more, all from unrestricted Python. Using meta-programming we can easily bend some of the restrictions of RPython, in this case we are going to unroll the loop and inline valid RPython code.

STAR_TYPES = 'bool int float str tuple list Meta ID'.split()
class ID(object):
def __init__(self,value='undefined'): self.value = value

class Meta(object): pass

class SimplePack( Meta ): # would be useful to turn this into a minipickler
def init_head(self):
self.ID = '0'
self._init_string = ''

def init_bool(self, v ): self._init_string += '%s%s' %(v, ARGUMENT_SEP)
def init_int(self, v ): self._init_string += '%s%s' %(v, ARGUMENT_SEP)
def init_float(self, v ): self._init_string += '%s%s' %(v, ARGUMENT_SEP)
def init_str(self, v ): self._init_string += '"%s"%s' %(v, ARGUMENT_SEP)
def init_tuple(self, v ): self._init_string += '%s%s' %(v, ARGUMENT_SEP)
def init_list(self, v ): self._init_string += '%s%s' %(v, ARGUMENT_SEP)
def init_Meta(self, v ): self._init_string += '@%s%s' %(v.ID, ARGUMENT_SEP); print 'got meta'
def init_ID(self, v ): self.ID = v.value; print 'got id'

The SimplePack subclass above is going to handle each possible type that may be passed in *args from our generated function. gen_star_func and gen_switch_block are going to create the unrolled function for us, that can be very large depending on what maxitems is set to as we will see below.

def gen_switch_block( prefix, indent, index ):
indent += ' '
r = ''
for type in STAR_TYPES:
if not r:
r += '%sif isinstance( args[%s], %s ): self.%s_%s(args[%s])\n' %(indent,index,type, prefix,type,index)
r += '%selif isinstance( args[%s], %s ): self.%s_%s(args[%s])\n' %(indent,index,type, prefix,type,index)
r += '%selse: print "unknown type", args[%s]\n' %(indent,index)
return r

def gen_star_func( name, handler_prefix='init', head=None, tail=None, maxitems=16 ):
ident = ' '
if head: body = ' self.%s\n' %head
else: body = ''
body += ' n = len(args)\n'

for i in range( maxitems ):
body += '%sif n > %s:\n%s' %(ident, i, gen_switch_block(handler_prefix, ident,i) )
ident += ' '
if tail: body += '\n ' + tail

e = 'def star_func(self, *args):\n' + body
print e
exec( e )
star_func.func_name = name
return star_func

The output of gen_star_func simply checks for each possible type, and forwards it to a handler. In terms of speed there should not be a big expense, since the next switch block is only executed if the length of *args is greater, we lose some memory but it is reasonable as long as maxitems is not too high. As you can see below the output of gen_star_func greatly helps keeps our code maintainable not having to copy and paste into every function where we want to use *args and variable types.

def star_func(self, *args):
n = len(args)
if n > 0:
if isinstance( args[0], bool ): self.init_bool(args[0])
elif isinstance( args[0], int ): self.init_int(args[0])
elif isinstance( args[0], float ): self.init_float(args[0])
elif isinstance( args[0], str ): self.init_str(args[0])
elif isinstance( args[0], tuple ): self.init_tuple(args[0])
elif isinstance( args[0], list ): self.init_list(args[0])
elif isinstance( args[0], Meta ): self.init_Meta(args[0])
elif isinstance( args[0], ID ): self.init_ID(args[0])
else: print "unknown type", args[0]
if n > 1:
if isinstance( args[1], bool ): self.init_bool(args[1])
elif isinstance( args[1], int ): self.init_int(args[1])
elif isinstance( args[1], float ): self.init_float(args[1])
elif isinstance( args[1], str ): self.init_str(args[1])
elif isinstance( args[1], tuple ): self.init_tuple(args[1])
elif isinstance( args[1], list ): self.init_list(args[1])
elif isinstance( args[1], Meta ): self.init_Meta(args[1])
elif isinstance( args[1], ID ): self.init_ID(args[1])
else: print "unknown type", args[1]
if n > 2:
...(to maxitems)

def gen_class( name, base=Meta, items={} ):
cls = types.ClassType(str(name), bases=(base,), dict=items)
globals().update( {name:cls} )
return cls

def gen_metas( names ):
generated = []
init = gen_star_func( '__init__', handler_prefix='init', head='init_head()' )
for name in names.split():
items = {'__init__':init}
cls = gen_class( name, base=SimplePack, items=items )
generated.append( cls )
return generated
GEN_METAS = gen_metas( 'Meta1 Meta2 Meta3' )

def test_meta():
m1 = Meta1()
myid = ID('myid')
print myid
m2 = Meta2('a', 'b', 1.0, 'string', m1, myid )
print m2._init_string
m3 = Meta3( m1, m2 )
print m3._init_string

The test above test_meta takes the variable argument types and packs them into a string that could be sent over a pipe or socket for communication with another process.

To run these tests you need to download the pypy source code, and set PATH2PYPY to where pypy is.
Below the tests function is the pypy entry point, to run the different tests above uncomment test_*
You can download the source code for this whole demo here.

PATH2PYPY = 'pypy'
sys.path.append(PATH2PYPY) # assumes you have pypy dist in a subfolder, you may need to rename this pypy-trunk

def tests( outside ):

from pypy.translator.interactive import Translation
t = Translation( tests )
t.annotate([str]); t.rtype()
f = t.compile_c()
f('from the outside')

print( 'end of program' )

No comments:

Post a Comment