I write this note to present issues and to study how they could be solved.

The pure-Numpy code ¶

Let’s consider this simple method taken from the Fluidsim code :

   def _time_step_RK2(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()
    state_spect_n12 = (state_spect + dt / 2 * tendencies_n) * diss2
    tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
    self.sim.state.state_spect = (
        state_spect * diss + dt * diss2 * tendencies_n12
    )

  

dt is a float
diss and diss2 are 1d, 2d or 3d Numpy arrays of dtype float or complex.
state_spect and the “tendencies” variables are 2d, 3d or 4d Numpy arrays of dtype float or complex. Actually, they are special Numpy arrays of a user-defined type called SetOfVariables .

The Cython solution ¶

For each type and each shape, one has to write something like:

   
    @cython.embedsignature(True)
    @cython.boundscheck(False)
    @cython.wraparound(False)
    def _time_step_RK2_state_ndim3_freqlin_ndim2_float(self):
        cdef double dt = self.deltat
        cdef Py_ssize_t i0, i1, ik, nk, n0, n1
        cdef np.ndarray[double, ndim=2] diss, diss2
        cdef np.ndarray[DTYPEc_t, ndim=3] state_spect, datat, datatemp

        compute_tendencies = self.sim.tendencies_nonlin
        state_spect = self.sim.state.state_spect

        nk = state_spect.shape[0]
        n0 = state_spect.shape[1]
        n1 = state_spect.shape[2]

        diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

        tendencies_n = compute_tendencies()

        datat = tendencies_n
        datatemp = state_spect_n12 = self.state_spect_tmp

        for ik in range(nk):
            for i0 in range(n0):
                for i1 in range(n1):
                    datatemp[ik, i0, i1] = (
                        state_spect[ik, i0, i1] + dt / 2 * datat[ik, i0, i1]
                    ) * diss2[i0, i1]

        datat = tendencies_n12 = compute_tendencies(
            state_spect_n12, old=tendencies_n
        )

        for ik in range(nk):
            for i0 in range(n0):
                for i1 in range(n1):
                    state_spect[ik, i0, i1] = (
                        state_spect[ik, i0, i1] * diss[i0, i1]
                        + dt * diss2[i0, i1] * datat[ik, i0, i1]
                    )


  

The Pythran solution ¶

   from .rk_pythran import _step0_RK2_pythran, _step1_RK2_pythran

def _time_step_RK2_pythran(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()

    state_spect_n12 = self._state_spect_tmp
    # state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2
    step0_RK2_pythran(state_spect_n12, state_spect, tendencies_n, diss2, dt)

    tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)
    # state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12
    step1_RK2_pythran(state_spect, tendencies_n12, diss, diss2, dt)


  

And in another file ( rk_pythran.py ), that will have to be compiled by Pythran:

   
# pythran export step0_RK2_pythran(
#     complex128[][][],
#     complex128[][][],
#     complex128[][][],
#     float64[][] or complex128[][],
#     float
# )

# pythran export step0_RK2_pythran(
#     complex128[][][][],
#     complex128[][][][],
#     complex128[][][][],
#     float64[][][] or complex128[][][],
#     float
# )


def step0_RK2_pythran(state_spect_n12, state_spect, tendencies_n, diss2, dt):
    state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2


# pythran export step1_RK2_pythran(
# complex128[][][],
# complex128[][][],
# float64[][] or complex128[][],
# float64[][] or complex128[][],
# float
# )

# pythran export step1_RK2_pythran(
# complex128[][][][],
# complex128[][][][],
# float64[][][] or complex128[][][],
# float64[][][] or complex128[][][],
# float
# )

def step1_RK2_pythran(state_spect, tendencies_n12, diss, diss2, dt):
    state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12


  

The great advantage is that we can keep the Numpy abstraction…

As written by Serge Guelton , “Pythran vectorizes the expression template and generates calls to xsimd”.

However, by splitting the code in two files, we lose the coherence of the method (even though for this example, it is still quite readable).

Cython + Pythran ¶

Pythran can be used by Cython (thanks in particular to Adrien Guinet). Can we use that? But I don’t see how to write the Pythran type annotations.

Can we do better ? Science fiction ! ¶

First, we can think with this code, since it could be faster (no memory allocation).

   
def _time_step_RK2_pythran(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()
    state_spect_n12 = self._state_spect_tmp

    state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2

    tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)

    state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12

Cython + Pythran ¶

We could just add new Pythran annotations that Cython could use…

    
def _time_step_RK2_pythran(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()

    state_spect_n12 = self._state_spect_tmp

    # pythran block (
    #     complex128[][][] state_spect_n12, state_spect, tendencies_n,
    #     float64[][] or complex128[][] diss2,
    #     float dt,
    # )

    # pythran block (
    #     complex128[][][][] state_spect_n12, state_spect, tendencies_n,
    #     float64[][][] or complex128[][][] diss2,
    #     float dt,
    # )

    state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2

    # pythran end block

    tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)

    # pythran block (
    #     complex128[][][] state_spect, tendencies_n12,
    #     float64[][] or complex128[][] diss, diss2,
    #     float dt,
    # )

    # pythran block (
    #     complex128[][][][] state_spect, tendencies_n12,
    #     float64[][][] or complex128[][][] diss, diss2,
    #     float dt,
    # )

    state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12

    # pythran end block


   

Python + Pythran ¶

It would be also nice to have a pure Python syntax for this. There would of course be a “compilation” step to create a Pythran file from the main Python file containing the class definition.

I worked on a proof of concept called fluidpythran , so I use here the fluidpythran syntax.

    
from fluidpythran import FluidPythran

fp = FluidPythran()


def _time_step_RK2_pythran(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()
    state_spect_n12 = self._state_spect_tmp

    if fp.is_pythranized:
        fp.use_pythranized_block("RK2_step0")
    else:
        # pythran block (
        #     complex128[][][] state_spect_n12, state_spect, tendencies_n;
        #     float64[][] or complex128[][] diss2;
        #     float dt;
        # )
        # pythran block (
        #     complex128[][][][] state_spect_n12, state_spect, tendencies_n;
        #     float64[][][] or complex128[][][] diss2;
        #     float dt;
        # )
        state_spect_n12[:] = (state_spect + dt / 2 * tendencies_n) * diss2

    tendencies_n12 = compute_tendencies(state_spect_n12, old=tendencies_n)

    if fp.is_pythranized:
        fp.use_pythranized_block("RK2_step1")
    else:
        # pythran block (
        #     complex128[][][] state_spect, tendencies_n12;
        #     float64[][] or complex128[][] diss, diss2;
        #     float dt;
        # )
        # pythran block (
        #     complex128[][][][] state_spect, tendencies_n12;
        #     float64[][][] or complex128[][][] diss, diss2;
        #     float dt;
        # )
        state_spect[:] = state_spect * diss + dt * diss2 * tendencies_n12


   

The block can also create variables, for example:

    
def other_func(self):
    dt = self.deltat
    diss, diss2 = self.exact_linear_coefs.get_updated_coefs()

    compute_tendencies = self.sim.tendencies_nonlin
    state_spect = self.sim.state.state_spect

    tendencies_n = compute_tendencies()

    if fp.is_pythranized:
        state_spect_n12 = fp.use_pythranized_block("RK2_step1")
    else:
        # pythran block (
        #     complex128[][][] state_spect, tendencies_n;
        #     float64[][] or complex128[][] diss2;
        #     float dt;
        # ) -> (state_spect_n12)
        # pythran block (
        #     complex128[][][][] state_spect, tendencies_n;
        #     float64[][][] or complex128[][][] diss2;
        #     float dt;
        # ) -> (state_spect_n12)
        state_spect_n12 = (state_spect + dt / 2 * tendencies_n) * diss2

    ...

   

This pure-Python solution has advantages:

The syntax is clear and quite explicit. The code remains in the function. The type annotations are even easier to write than for # pythran export .
It is pure Python. Pypy (and other implementations) should be happy.
The performance cost at run time should be very small.

It has also disadvantages:

Performance cost which can be non negligible in some cases (to be quantified). Note that it could be possible to minimize them with Pypy or Cython.
It is not so explicit and there is a little bit of magic. use_pythranized_block has to modify objects with references in its caller (in the example, state_spect ).
The two branches of the if clause are of course not equivalent. A variable modified in the block that is not explicitly mentionned as returned by the Pythran function won’t be modified in the Pythran branch. This can be error prone.