A new package
A unified modern Python API for Python/Numpy accelerators
A thin layer between developers and Pythran / Cython / Numba
Python $\geqslant$ 3.6
Numpy API
an API to describe numerical types
Pure Python package (>= 3.6) to easily accelerate modern Python-Numpy code with different accelerators
Unified Python API to use different Python-Numpy accelerators
Keep your Python-Numpy code clean and "natural" 🧘
Clean type annotations (🐍 3)
Easily mix Python code and compiled functions
Ahead-of-time (AOT) and just-in-time (JIT) modes
JIT based on AOT compilers (especially for Pythran)
Accelerate functions, methods (of classes) and blocks of code
Work in progress! Current state: 3 backends based on Pythran, Cython and Numba
These codes can be accelerated with Pythran, Cython and Numba.
import numpy as np
from transonic import boost
T0 = "int[:, :]"
T1 = "int[:]"
@boost
def row_sum(arr: T0, columns: T1):
return arr.T[columns].sum(0)
@boost(boundscheck=False, wraparound=False)
def row_sum_loops(arr: T0, columns: T1):
# locals type annotations are used only for Cython
i: int
j: int
sum_: int
res: "int[]" = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(arr.shape[0]):
sum_ = 0
for j in range(columns.shape[0]):
sum_ += arr[i, columns[j]]
res[i] = sum_
return res
import numpy as np
from transonic import jit
def add(a, b):
return a + b
@jit
def func(a, b):
return np.exp(a) * b * add(a, b)
import numpy as np
from transonic import Type, NDim, Array, boost
T = Type(int, float, np.complex128)
N = NDim(1, 2, 3)
A = Array[T, N]
A1 = Array[np.float32, N + 1]
@boost
def compute(a: A, b: A, c: T, d: A1):
...
inline
functions¶from transonic import boost
T = int
@boost(inline=True)
def add(a: T, b: T) -> T:
return a + b
@boost
def use_add(n: int = 10000):
_: int
for _ in range(n):
tmp = add(tmp, 1)
return tmp
from transonic import boost
@boost
class MyClass:
attr: int
@boost
def numerical_kernel(self, arg: int):
return self.attr + arg
We try to fix issues of our community!
Incompatible accelerators
Cython "hegemony" (C-like code)
Pythran not as used/supported as it should
Balance between time/energy spent, generality of the code, readability
"Premature optimization is the root of all evil" (Donald Knuth)
80 / 20 rule, efficiency important for expensive things and NOT for small things
from transonic import boost
@boost
def my_numerical_kernel():
...
Not like CPython: compile(...)
to (high level) virtual machine instructions (with nearly no optimization)
One needs to write code that can be well optimized by a compiler!
Just-in-time (@jit
)
Has to be fast (warm up), can be hardware specific
Ahead-of-time (@boost
)
Can be slow, hardware specific or more general to distribute binaries
From one language to another language (for example Python to C++, or Cython to C)
programs (Nuitka)
slowest loops (PyPy)
modules (Cython, Pythran)
user-defined functions / methods (Numba, Transonic)
blocks of code (Transonic)
expressions (Numexp)
call compiled functions (Numpy / Python)
Clearly, useless to properly compile all Python features (inspect
, metaclass, ...)!
At least "Python used in numerical kernels" + Numpy!
User-defined classes?
Langage: superset of Python
A great mix of Python / C / CPython C API!
Very powerfull but a tool for experts!
Easy to study where the interpreter is used (cython --annotate
).
Very mature
Very efficient for C-like code (explicit loops, "low level")
Now able to use Pythran internally...
My experience: large Cython extensions difficult to maintain
from numba import jit
@jit
def myfunc(x):
return x**2
"nopython" mode (fast and no GIL) 🙂
Also a "python" mode 🙂
GPU and Cupy 😀
Methods (of classes) 🙂
Sometimes not as much efficient as it could be 🙁
(sometimes slower than Pythran / Julia / C++)
Transpiles Python to efficient C++
Good to optimize high-level NumPy code 😎
Extensions never use the Python interpreter (pure C++ ⇒ no GIL) 🙂
Can produce C++ that can be used without Python
Usually very efficient (sometimes faster than Julia)
High and low level optimizations
(Python optimizations and C++ compilation)
SIMD 🤩 (with xsimd)
Understand OpenMP instructions 🤗 !
Can use and make PyCapsules (functions operating in the native word) 🙂
Compile only full modules (⇒ refactoring needed 🙁)
Only "nopython" mode
limited to a subset of Python
limited to few extension packages (Numpy + bits of Scipy)
pythranized functions can't call Python functions
No JIT: need types (written manually in comments)
Lengthy ⌛️ and memory intensive compilations
Debugging 🐜 Pythran requires C++ skills
No GPU (maybe with OpenMP 4?)
compilers unable to compile Pythran C++11 👎
Small community, only 1 core-dev
Performance issues, especially for crunching numbers 🔢
⇒ need to accelerate the "numerical kernels"
Many good accelerators and compilers for Python-Numpy code
All have pros and cons! Very different technologies!
Diversity good for open-source
Pythran is really great
The Python community would be wise to use and support it!
⇒ We shouldn't have to write specialized code for one accelerator!
Incompatible accelerators + "Cython hegemony"
Pure Python package (>= 3.6) to easily accelerate modern Python-Numpy code with different accelerators
Unified Python API to use different Python-Numpy accelerators
JIT (@jit
)
AOT compilation for functions and methods (@boost
)
Blocks of code (with if ts.is_transpiled:
)
Parallelism with a class (adapted from Olivier Borderies)
omp/tsp.py (OpenMP) and tsp_concurrent.py (concurrent - threads)
Also compatible with MPI!
Works also well in simple scripts and IPython / Jupyter.
import numpy as np
from transonic import boost
T0 = "int[:, :]"
T1 = "int[:]"
@boost
def row_sum(arr: T0, columns: T1):
return arr.T[columns].sum(0)
@boost
def row_sum_loops(arr: T0, columns: T1):
# locals type annotations are used only for Cython
i: int
j: int
sum_: int
res: "int[]" = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(arr.shape[0]):
sum_ = 0
for j in range(columns.shape[0]):
sum_ += arr[i, columns[j]]
res[i] = sum_
return res
TRANSONIC_BACKEND="python" python row_sum_boost.py
Python
row_sum 1.38 s
row_sum_loops 108.57 s
TRANSONIC_BACKEND="cython" python row_sum_boost.py
Cython
row_sum 1.32 s
row_sum_loops 0.38 s
TRANSONIC_BACKEND="numba" python row_sum_boost.py
Numba
row_sum 1.16 s
row_sum_loops 0.27 s
TRANSONIC_BACKEND="pythran" python row_sum_boost.py
Pythran
row_sum 0.76 s
row_sum_loops 0.27 s
import numpy as np
from transonic import jit
@jit(native=True, xsimd=True)
def row_sum(arr, columns):
return arr.T[columns].sum(0)
@jit(native=True, xsimd=True)
def row_sum_loops(arr, columns):
res = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(arr.shape[0]):
sum_ = 0
for j in range(columns.shape[0]):
sum_ += arr[i, columns[j]]
res[i] = sum_
return res
TRANSONIC_BACKEND="cython" python row_sum_jit.py
Cython
row_sum 1.28 s
row_sum_loops 11.94 s
TRANSONIC_BACKEND="numba" python row_sum_jit.py
Numba
row_sum 1.14 s
row_sum_loops 0.28 s
TRANSONIC_BACKEND="pythran" python row_sum_jit.py
Pythran
row_sum 0.76 s
row_sum_loops 0.28 s
Transonic: a unified Python API for different Python-Numpy accelerators
Very efficient scientific software can be written in clean modern Python
Improve the API, support more Cython, Pythran and Numba (PyCapsules, dataclass, ...)
Improve backends, create other backends (Cupy, PyTorch, ...)
from transonic import dataclass, boost
@dataclass
class MyStruct:
attr: int
def compute(self, arg: int):
return self.attr + arg
def modify(self, arg: int):
self.attr = arg
@boost
def func(o: MyStruct, a: int):
o.modify(a)
return o
@boost
def func1(o: MyStruct, a: int):
return o.compute(a)
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-19-2550813677e6> in <module> ----> 1 from transonic import dataclass, boost 2 3 @dataclass 4 class MyStruct: 5 attr: int ImportError: cannot import name 'dataclass' from 'transonic' (/home/pierre/Dev/transonic/transonic/__init__.py)