I write this note to present issues and to study how they could be solved.
The pure-Numpy code ¶
Let’s consider this simple method taken from the Fluidsim code :
def _time_step_RK2(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
state_spect_n12 = (state_spect + dt / 2 * tendencies_n) * diss2
tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
self.sim.state.state_spect = (
state_spect * diss + dt * diss2 * tendencies_n12
)
-
dt
is a float -
diss
anddiss2
are 1d, 2d or 3d Numpy arrays of dtype float or complex. -
state_spect
and the “tendencies” variables are 2d, 3d or 4d Numpy arrays of dtype float or complex. Actually, they are special Numpy arrays of a user-defined type called SetOfVariables .
The Cython solution ¶
For each type and each shape, one has to write something like:
@cython.embedsignature(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def _time_step_RK2_state_ndim3_freqlin_ndim2_float(self):
cdef double dt = self.deltat
cdef Py_ssize_t i0, i1, ik, nk, n0, n1
cdef np.ndarray[double, ndim=2] diss, diss2
cdef np.ndarray[DTYPEc_t, ndim=3] state_spect, datat, datatemp
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
nk = state_spect.shape[0]
n0 = state_spect.shape[1]
n1 = state_spect.shape[2]
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
tendencies_n = compute_tendencies()
datat = tendencies_n
datatemp = state_spect_n12 = self.state_spect_tmp
for ik in range(nk):
for i0 in range(n0):
for i1 in range(n1):
datatemp[ik, i0, i1] = (
state_spect[ik, i0, i1] + dt / 2 * datat[ik, i0, i1]
) * diss2[i0, i1]
datat = tendencies_n12 = compute_tendencies(
state_spect_n12, old=tendencies_n
)
for ik in range(nk):
for i0 in range(n0):
for i1 in range(n1):
state_spect[ik, i0, i1] = (
state_spect[ik, i0, i1] * diss[i0, i1]
+ dt * diss2[i0, i1] * datat[ik, i0, i1]
)
The Pythran solution ¶
from .rk_pythran import _step0_RK2_pythran, _step1_RK2_pythran
def _time_step_RK2_pythran(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
state_spect_n12 = self._state_spect_tmp
# state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
step0_RK2_pythran(state_spect_n12, state_spect, tendencies_n, diss2, dt)
tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
# state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
step1_RK2_pythran(state_spect, tendencies_n12, diss, diss2, dt)
And in another file (
rk_pythran.py
), that will have to be compiled by Pythran:
# pythran export step0_RK2_pythran(
# complex128[][][],
# complex128[][][],
# complex128[][][],
# float64[][] or complex128[][],
# float
# )
# pythran export step0_RK2_pythran(
# complex128[][][][],
# complex128[][][][],
# complex128[][][][],
# float64[][][] or complex128[][][],
# float
# )
def step0_RK2_pythran(state_spect_n12, state_spect, tendencies_n, diss2, dt):
state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
# pythran export step1_RK2_pythran(
# complex128[][][],
# complex128[][][],
# float64[][] or complex128[][],
# float64[][] or complex128[][],
# float
# )
# pythran export step1_RK2_pythran(
# complex128[][][][],
# complex128[][][][],
# float64[][][] or complex128[][][],
# float64[][][] or complex128[][][],
# float
# )
def step1_RK2_pythran(state_spect, tendencies_n12, diss, diss2, dt):
state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
The great advantage is that we can keep the Numpy abstraction…
As written by Serge Guelton , “Pythran vectorizes the expression template and generates calls to xsimd”.
However, by splitting the code in two files, we lose the coherence of the method (even though for this example, it is still quite readable).
Cython + Pythran ¶
Pythran can be used by Cython (thanks in particular to Adrien Guinet). Can we use that? But I don’t see how to write the Pythran type annotations.
Can we do better ? Science fiction ! ¶
First, we can think with this code, since it could be faster (no memory allocation).
def _time_step_RK2_pythran(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
state_spect_n12 = self._state_spect_tmp
state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
Cython + Pythran ¶
We could just add new Pythran annotations that Cython could use…
def _time_step_RK2_pythran(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
state_spect_n12 = self._state_spect_tmp
# pythran block (
# complex128[][][] state_spect_n12, state_spect, tendencies_n,
# float64[][] or complex128[][] diss2,
# float dt,
# )
# pythran block (
# complex128[][][][] state_spect_n12, state_spect, tendencies_n,
# float64[][][] or complex128[][][] diss2,
# float dt,
# )
state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
# pythran end block
tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
# pythran block (
# complex128[][][] state_spect, tendencies_n12,
# float64[][] or complex128[][] diss, diss2,
# float dt,
# )
# pythran block (
# complex128[][][][] state_spect, tendencies_n12,
# float64[][][] or complex128[][][] diss, diss2,
# float dt,
# )
state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
# pythran end block
Python + Pythran ¶
It would be also nice to have a pure Python syntax for this. There would of course be a “compilation” step to create a Pythran file from the main Python file containing the class definition.
I worked on a proof of concept called fluidpythran , so I use here the fluidpythran syntax.
from fluidpythran import FluidPythran
fp = FluidPythran()
def _time_step_RK2_pythran(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
state_spect_n12 = self._state_spect_tmp
if fp.is_pythranized:
fp.use_pythranized_block("RK2_step0")
else:
# pythran block (
# complex128[][][] state_spect_n12, state_spect, tendencies_n;
# float64[][] or complex128[][] diss2;
# float dt;
# )
# pythran block (
# complex128[][][][] state_spect_n12, state_spect, tendencies_n;
# float64[][][] or complex128[][][] diss2;
# float dt;
# )
state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
if fp.is_pythranized:
fp.use_pythranized_block("RK2_step1")
else:
# pythran block (
# complex128[][][] state_spect, tendencies_n12;
# float64[][] or complex128[][] diss, diss2;
# float dt;
# )
# pythran block (
# complex128[][][][] state_spect, tendencies_n12;
# float64[][][] or complex128[][][] diss, diss2;
# float dt;
# )
state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
The block can also create variables, for example:
def other_func(self):
dt = self.deltat
diss, diss2 = self.exact_linear_coefs.get_updated_coefs()
compute_tendencies = self.sim.tendencies_nonlin
state_spect = self.sim.state.state_spect
tendencies_n = compute_tendencies()
if fp.is_pythranized:
state_spect_n12 = fp.use_pythranized_block("RK2_step1")
else:
# pythran block (
# complex128[][][] state_spect, tendencies_n;
# float64[][] or complex128[][] diss2;
# float dt;
# ) -> (state_spect_n12)
# pythran block (
# complex128[][][][] state_spect, tendencies_n;
# float64[][][] or complex128[][][] diss2;
# float dt;
# ) -> (state_spect_n12)
state_spect_n12 = (state_spect + dt / 2 * tendencies_n) * diss2
...
This pure-Python solution has advantages:
-
The syntax is clear and quite explicit. The code remains in the function. The type annotations are even easier to write than for
# pythran export
. -
It is pure Python. Pypy (and other implementations) should be happy.
-
The performance cost at run time should be very small.
It has also disadvantages:
-
Performance cost which can be non negligible in some cases (to be quantified). Note that it could be possible to minimize them with Pypy or Cython.
-
It is not so explicit and there is a little bit of magic.
use_pythranized_block
has to modify objects with references in its caller (in the example,state_spect
). -
The two branches of the
if
clause are of course not equivalent. A variable modified in the block that is not explicitly mentionned as returned by the Pythran function won’t be modified in the Pythran branch. This can be error prone.