Sunday, April 22, 2012

Rpython to LLVM

Psyco and Unladen Swallow were the first to try to make a just-in-time compiler (JIT) for Python, but these projects have stopped, leaving standard Python with no good JIT solution. So I started investigating how hard would it be to make a JIT for Python using Rpython and LLVM. The results of my first highly experimental implementation of Rpython-to-LLVM show very fast JIT performance: 4x faster than PyPy, 200x faster than Python2, and 260x faster than Python3.

Test Function

def simple_test(a, b):
 c = 0
 while c < 100000*100000:
  c += a + b
 return c
The test function is simply a huge loop that adds-to and returns a 64bit integer. The test was performed on a AMD 2.4ghz Quad with 4GB of RAM, average test result times are:
  • Rpython-to-LLVM = 2 seconds
  • PyPy1.8 (with warm JIT) = 8 seconds
  • Python2.7.2 = 400 seconds
  • Python3.2.2 = 530 seconds

Building The JIT

The first challenge in this project was building the code that traverses the Rpython flow-graph ("flow object space") and converts it into LLVM format. For each Rpython flow-graph block a new LLVM basic-block is created, and for each operation in the block a new LLVM instruction is created. Blocks that loop and modify a variable require some extra work, these mutable variables are treated as stack allocations, and then the LLVM optimization pass PROMOTE_MEMORY_TO_REGISTER replaces the costly stack allocations with fast register memory. It is interesting to see what LLVM IR looks like for the simple function used in this test, before and after the PROMOTE_MEMORY_TO_REGISTER optimization.
Raw LLVM IR
define i64 @simple_test(i64 %a_1, i64 %b_1) {
entry:
  %st_a_1 = alloca i64                            ;  [#uses=2]
  store i64 %a_1, i64* %st_a_1
  %st_b_1 = alloca i64                            ;  [#uses=2]
  store i64 %b_1, i64* %st_b_1
  %st = alloca i64                                ;  [#uses=1]
  store i64 0, i64* %st
  %st_v2 = alloca i64                             ;  [#uses=4]
  store i64 %a_1, i64* %st_v2
  br label %while_loop

while_loop:                                       ; preds = %while_loop, %entry
  %a_0 = load i64* %st_a_1                        ;  [#uses=1]
  %b_0 = load i64* %st_b_1                        ;  [#uses=1]
  %v0 = add i64 %a_0, %b_0                        ;  [#uses=1]
  %v1 = load i64* %st_v2                          ;  [#uses=1]
  %v2 = add i64 %v1, %v0                          ;  [#uses=2]
  store i64 %v2, i64* %st_v2
  %v3 = icmp ult i64 %v2, 10000000000             ;  [#uses=1]
  br i1 %v3, label %while_loop, label %else_return

else_return:                                      ; preds = %while_loop
  %0 = load i64* %st_v2                           ;  [#uses=1]
  ret i64 %0
}
LLVM IR (after PROMOTE_MEMORY_TO_REGISTER)
define i64 @simple_test(i64 %a_1, i64 %b_1) {
entry:
  br label %while_loop

while_loop:                                       ; preds = %while_loop, %entry
  %st_v2.0 = phi i64 [ %a_1, %entry ], [ %v2, %while_loop ] ;  [#uses=1]
  %v0 = add i64 %a_1, %b_1                        ;  [#uses=1]
  %v2 = add i64 %st_v2.0, %v0                     ;  [#uses=3]
  %v3 = icmp ult i64 %v2, 10000000000             ;  [#uses=1]
  br i1 %v3, label %while_loop, label %else_return

else_return:                                      ; preds = %while_loop
  ret i64 %v2
}

LLVM Advantages

LLVM is more than just a JIT, because LLVM IR is platform independent, it becomes the best solution for making Python extension modules that need to support all platforms and all Python versions. A classic Python extension module is written in C, and must be compiled for each Python version, each OS, and each OS type (32bit and 64bits)! (Python2+Python3+PyPy)*(Linux+OSX+Windows)*(32bits+64bits) = 18 targets. How is anybody supposed to compile their Python extension for all 18 targets? LLVM IR can be generated on any platform any bit-depth, saved to a file, and later loaded and run on any target that PyLLVM supports. PyLLVM works with Python2 and Python3; and is easily portable to PyPy using cpyext. In other words, LLVM IR can easily hit all 18 targets - no problem.
Extra Advantages:

  • LLVM easily calls into C libraries
  • LLVM has a SIMD accelerated vector type
  • LLVM has powerful optimizations like: PROMOTE_MEMORY_TO_REGISTER
  • Rpython and LLVM are a natural fit
Still not convinced? Read what Intel has to say about LLVM.

source code

requires Mahadevan's PyLLVM

3 comments:

  1. No comment really just want to thank Brent for doing this. I have been working on specs for a Python stack that I think would be a good fit with web2py and pypy

    cheers guys!

    ReplyDelete
  2. sorry I meant Brett

    nice art you create yourself?

    ReplyDelete
    Replies
    1. Brian, yes that's my artwork before i had learned coding and Python.

      Delete