PyPy is a very large project that has a reputation of being hard to dive into. Some of this fame is warranted, some of it is purely accidental. There are three important lessons that everyone willing to contribute should learn:
PyPy has layers. The 100 miles view:
RPython is the language in which we write interpreters. Not the entire PyPy project is written in RPython, only the parts that are compiled in the translation process. The interesting point is that RPython has no parser, it’s compiled from the live python objects, which make it possible to do all kinds of metaprogramming during import time. In short, Python is a meta programming language for RPython.
The RPython standard library is to be found in the rlib subdirectory.
The translation toolchain - this is the part that takes care about translating RPython to flow graphs and then to C. There is more in the architecture document written about it.
It mostly lives in rpython, annotator and objspace/flow.
Python Interpreter
xxx
Python modules
xxx
Just-in-Time Compiler (JIT): we have a tracing JIT that traces the interpreter written in RPython, rather than the user program that it interprets. As a result it applies to any interpreter, i.e. any language. But getting it to work correctly is not trivial: it requires a small number of precise “hints” and possibly some small refactorings of the interpreter. The JIT itself also has several almost-independent parts: the tracer itself in jit/metainterp, the optimizer in jit/metainterp/optimizer that optimizes a list of residual operations, and the backend in jit/backend/<machine-name> that turns it into machine code. Writing a new backend is a traditional way to get into the project.
xxx