A new package
A unified modern Python API for Python/Numpy accelerators
A thin layer between developers and Pythran / Cython / Numba
Python $\geqslant$ 3.6
Numpy API
an API to describe numerical types
Pure Python package (>= 3.6) to easily accelerate modern Python-Numpy code with different accelerators
Unified Python API to use different Python-Numpy accelerators
Keep your Python-Numpy code clean and "natural" 🧘
Clean type annotations (🐍 3)
Easily mix Python code and compiled functions
Ahead-of-time (AOT) and just-in-time (JIT) modes
JIT based on AOT compilers (especially for Pythran)
Accelerate functions, methods (of classes) and blocks of code
Work in progress! Current state: 3 backends based on Pythran, Cython and Numba
These codes can be accelerated with Pythran, Cython and Numba.
import numpy as np
from transonic import boost
T0 = "int[:, :]"
T1 = "int[:]"
@boost
def row_sum(arr: T0, columns: T1):
return arr.T[columns].sum(0)
@boost(boundscheck=False, wraparound=False)
def row_sum_loops(arr: T0, columns: T1):
# locals type annotations are used only for Cython
i: int
j: int
sum_: int
res: "int[]" = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(arr.shape[0]):
sum_ = 0
for j in range(columns.shape[0]):
sum_ += arr[i, columns[j]]
res[i] = sum_
return res
import numpy as np
from transonic import jit
def add(a, b):
return a + b
@jit
def func(a, b):
return np.exp(a) * b * add(a, b)
import numpy as np
from transonic import Type, NDim, Array, boost
T = Type(int, float, np.complex128)
N = NDim(1, 2, 3)
A = Array[T, N]
A1 = Array[np.float32, N + 1]
@boost
def compute(a: A, b: A, c: T, d: A1):
...
inline
functions¶from transonic import boost
T = int
@boost(inline=True)
def add(a: T, b: T) -> T:
return a + b
@boost
def use_add(n: int = 10000):
_: int
for _ in range(n):
tmp = add(tmp, 1)
return tmp
from transonic import boost
@boost
class MyClass:
attr: int
@boost
def numerical_kernel(self, arg: int):
return self.attr + arg
We try to fix issues of our community!
Incompatible accelerators
Cython "hegemony" (C-like code)
Pythran not as used/supported as it should
Balance between time/energy spent, generality of the code, readability
"Premature optimization is the root of all evil" (Donald Knuth)
80 / 20 rule, efficiency important for expensive things and NOT for small things
from transonic import boost
@boost
def my_numerical_kernel():
...
Not like CPython: compile(...)
to (high level) virtual machine instructions (with nearly no optimization)
One needs to write code that can be well optimized by a compiler!
Just-in-time (@jit
)
Has to be fast (warm up), can be hardware specific
Ahead-of-time (@boost
)
Can be slow, hardware specific or more general to distribute binaries
From one language to another language (for example Python to C++, or Cython to C)
programs (Nuitka)
slowest loops (PyPy)
modules (Cython, Pythran)
user-defined functions / methods (Numba, Transonic)
blocks of code (Transonic)
expressions (Numexp)
call compiled functions (Numpy / Python)
Clearly, useless to properly compile all Python features (inspect
, metaclass, ...)!
At least "Python used in numerical kernels" + Numpy!
User-defined classes?
Langage: superset of Python
A great mix of Python / C / CPython C API!
Very powerfull but a tool for experts!
Easy to study where the interpreter is used (cython --annotate
).
Very mature
Very efficient for C-like code (explicit loops, "low level")
Now able to use Pythran internally...
My experience: large Cython extensions difficult to maintain
from numba import jit
@jit
def myfunc(x):
return x**2
"nopython" mode (fast and no GIL) 🙂
Also a "python" mode 🙂
GPU and Cupy 😀
Methods (of classes) 🙂
Sometimes not as much efficient as it could be 🙁
(sometimes slower than Pythran / Julia / C++)
Transpiles Python to efficient C++
Good to optimize high-level NumPy code 😎
Extensions never use the Python interpreter (pure C++ ⇒ no GIL) 🙂
Can produce C++ that can be used without Python
Usually very efficient (sometimes faster than Julia)
High and low level optimizations
(Python optimizations and C++ compilation)
SIMD 🤩 (with xsimd)
Understand OpenMP instructions 🤗 !
Can use and make PyCapsules (functions operating in the native word) 🙂
Compile only full modules (⇒ refactoring needed 🙁)
Only "nopython" mode
limited to a subset of Python
limited to few extension packages (Numpy + bits of Scipy)
pythranized functions can't call Python functions
No JIT: need types (written manually in comments)
Lengthy ⌛️ and memory intensive compilations
Debugging 🐜 Pythran requires C++ skills
No GPU (maybe with OpenMP 4?)
compilers unable to compile Pythran C++11 👎
Small community, only 1 core-dev
Performance issues, especially for crunching numbers 🔢
⇒ need to accelerate the "numerical kernels"
Many good accelerators and compilers for Python-Numpy code
All have pros and cons! Very different technologies!
Diversity good for open-source
Pythran is really great
The Python community would be wise to use and support it!
⇒ We shouldn't have to write specialized code for one accelerator!
Incompatible accelerators + "Cython hegemony"
Pure Python package (>= 3.6) to easily accelerate modern Python-Numpy code with different accelerators
Unified Python API to use different Python-Numpy accelerators
JIT (@jit
)
AOT compilation for functions and methods (@boost
)
Blocks of code (with if ts.is_transpiled:
)
Parallelism with a class (adapted from Olivier Borderies)
omp/tsp.py (OpenMP) and tsp_concurrent.py (concurrent - threads)
Also compatible with MPI!
Works also well in simple scripts and IPython / Jupyter.
import numpy as np
from transonic import boost
T0 = "int[:, :]"
T1 = "int[:]"
@boost
def row_sum(arr: T0, columns: T1):
return arr.T[columns].sum(0)
@boost
def row_sum_loops(arr: T0, columns: T1):
# locals type annotations are used only for Cython
i: int
j: int
sum_: int
res: "int[]" = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(arr.shape[0]):
sum_ = 0
for j in range(columns.shape[0]):
sum_ += arr[i, columns[j]]
res[i] = sum_
return res
Python
high level: 1.31e-03 s (= 1.00 * norm)
low level: 1.04e-01 s (= 79.27 * norm)
Cython
high level: 1.29e-03 s (= 0.99 * norm)
low level: 4.10e-04 s (= 0.31 * norm)
Numba
high level: 1.04e-03 s (= 0.80 * norm)
low level: 2.69e-04 s (= 0.21 * norm)
Pythran
high level: 7.68e-04 s (= 0.59 * norm)
low level: 2.55e-04 s (= 0.19 * norm)
import numpy as np
from transonic import jit
@jit(native=True, xsimd=True)
def fxfy(ft, fn, theta):
sin_theta = np.sin(theta)
cos_theta = np.cos(theta)
fx = cos_theta * ft - sin_theta * fn
fy = sin_theta * ft + cos_theta * fn
return fx, fy
@jit(native=True, xsimd=True)
def fxfy_loops(ft, fn, theta):
n0 = theta.size
fx = np.empty_like(ft)
fy = np.empty_like(fn)
for index in range(n0):
sin_theta = np.sin(theta[index])
cos_theta = np.cos(theta[index])
fx[index] = cos_theta * ft[index] - sin_theta * fn[index]
fy[index] = sin_theta * ft[index] + cos_theta * fn[index]
return fx, fy
fxfy (pure-Numpy) : 1.000 * norm
norm = 6.90e-04 s
fxfy_numba : 0.952 * norm
fxfy_loops_numba : 0.776 * norm
fxfy_pythran : 0.152 * norm
fxfy_loops_pythran : 0.784 * norm
Transonic: a unified Python API for different Python-Numpy accelerators
Very efficient scientific software can be written in clean modern Python
Improve the API, support more Cython, Pythran and Numba (PyCapsules, dataclass, ...)
Usage in important / fundamental packages (for example Scikit-image)
Improve backends, create other backends (Cupy, PyTorch, ...)
from transonic import dataclass, boost
@dataclass
class MyStruct:
attr: int
def compute(self, arg: int):
return self.attr + arg
def modify(self, arg: int):
self.attr = arg
@boost
def func(o: MyStruct, a: int):
o.modify(a)
return o
@boost
def func1(o: MyStruct, a: int):
return o.compute(a)
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-19-2550813677e6> in <module> ----> 1 from transonic import dataclass, boost 2 3 @dataclass 4 class MyStruct: 5 attr: int ImportError: cannot import name 'dataclass' from 'transonic' (/home/pierre/Dev/transonic/transonic/__init__.py)