If you want to make Python run faster on the same hardware, you have two basic options, each with a drawback:
- You can create a replacement for the default runtime used by the language (the CPython implementation)—a major undertaking, but the result would be a drop-in replacement for CPython.
- You can rewrite existing Python code to take advantage of certain speed optimisations, which means more work for the programmer but doesn’t require changes in the runtime.
Here are six ways the bar on Python performance is being raised. Each uses one of these two approaches, or a combination of the two:
Among the candidates for a drop-in replacement for CPython, PyPy is easily the most visible (Quora, for instance, uses it in production). It also stands the best chance of becoming the default, as it’s highly compatible with existing Python code.
Another long-standing drawback was that PyPy didn’t integrate well with common libraries used to accelerate Python performance, such as NumPy. However, recent releases go a long way towards addressing this problem.
PyPy still has other limitations. It’s best for long-running programs like servers, rather than one-and-done scripts, as its performance benefits don’t really register until after some warmup time. And its executable has a much larger footprint than CPython.
The Pyston project, originally created by Dropbox but since relaunched and rewritten, also uses a JIT to speed up Python. Its original incarnation used the LLVM compiler infrastructure to do this, but the rewrite dropped LLVM in favour of a hand-rolled assembler with much lower overhead.
The rewrite also uses CPython code as the basis for the project, so it’s far more compatible out-of-the-box with conventional Python. Pyston’s speedups are not very dramatic yet - about 20 per cent faster, on average - but the project is still very much in its infancy.
Rather than replace the Python runtime, some teams are doing away with a Python runtime entirely and seeking ways to transpile Python code to languages that run natively at high speed. Case in point: Nuitka, which converts Python to C++ code—and can automatically pack up all of the files needed from the CPython runtime to boot.
Long-term plans for Nuitka include allowing Nuitka-compiled Python to interface directly with C code, allowing for even greater speed.
Cython (C extensions for Python) is a superset of Python, a version of the language that compiles to C and interfaces with C/C++ code. It’s one way to write C extensions for Python, which wrap C or C++ code and give it an easy Python interface.
But Cython can also be used to incrementally accelerate Python functions, chiefly ones that perform math. The downside is that Cython uses its own peculiar syntax to work its magic, so porting existing code isn’t totally automatic.
That said, Cython provides several advantages for the sake of speed not available in vanilla Python, among them variable typing à la C itself. A number of scientific packages for Python, such as Scikit-learn, draw on Cython features like this to keep operations lean and fast.
Numba combines two of the previous approaches. Like Cython, it speeds up the parts of the language that most need it (typically CPU-bound math); like PyPy and Pyston, it uses JIT compilation. Functions compiled with Numba can be specified with a decorator, and Numba works hand-in-hand with NumPy to accelerate the functions found. In fact, Numba works best with libraries it is already familiar with, like NumPy.
The typed_python project, a nascent effort supported by A Priori Investments, takes a different approach from any of the above. It provides a collection of strongly typed data structures for Python that are restricted in the types they can hold.
For instance, one could create a list that only accepts integers. With this, one can then generate highly optimised code that runs faster and takes advantage of processor parallelism where possible. One can write the majority of the program in conventional Python, then use typed_python within a specific function to speed up its operations. This is akin to how Cython can be used to selectively speed up the parts of an application that can be a bottleneck.
Python creator Guido van Rossum is adamant that many of Python’s performance issues can be traced to improper use of the language. CPU-heavy processing, for instance, can be hastened through a few methods touched on here — using NumPy (for math), using the multiprocessing extensions, or making calls to external C code and thus avoiding the Global Interpreter Lock (GIL), the root of Python’s slowness.
But since there’s no viable replacement yet for the GIL in Python, it falls to others to come up with short-term solutions—and maybe long-term ones, too.