Python Native Interface for faster and stronger Python

was: Universal Python extensions: performance, compatibility, sustainability, and less CO₂




Pierre Augier


PyConFR, 2nd November 2025

About me (Pierre Augier)

  • CNRS researcher in fluid mechanics (geophysical turbulence, …)


  • Maintain software for research in fluid mechanics

  • Practical research on how people can

    • Use Python for HPC
    • Develop for science
    • Teach programming and computing for science

Performance of Python interpreters

  • CPython is still VERY slow

  • Is it an issue?

    YES (long term).

    Strongly limits what can be written in Python and how they can be written

  • Can it be fixed?

    Partly, but deep changes are needed, ecosystem-wide

    1. Python C API
    2. Funding

Recent notable improvement of the situation

When I proposed the abstract

A depressing situation

  • Faster CPython, PyPy and HPy in trouble
  • No solution

Now (after the C API Summit, EuroPython 2025)

  • A potential realistic solution (Python Native Interface)

  • A lot of work needed (ecosystem-wide investment) but seems doable

Glossary

Python the language
CPython the reference implementation (written in C & Python)
PyPy an alternative implementation (written in Python)
GraalPy an alternative implementation (written in Java)
Extensions libraries (written in C, Rust, …) usable as a module
Python C API

Application Programming Interface

set of C functions to interact with the interpreter

ABI Application Binary Interface
HPy project proposing an alternative C API
Cython a language (superset of Python) and a compiler

CPython is still very slow

  • relatively small improvements…

  • but still very slow compared to …

    • other dynamic languages

      (JavaScript, Julia, PHP, Matlab, …)

    • alternative Python interpreters oriented towards performance (PyPy and GraalPy)

CPython is still very slow

def short_comp(n):
    result = 0
    for i in range(1, n + 1):
        result += i
    return result


def long_comp(num):
    result = short_comp(0)
    for i in range(num):
        result += short_comp(i+1) - short_comp(i)
    result -= short_comp(num)
    return result

CPython is still very slow

CPython

$ $(uv python find 3.11) bench_loops_sum.py
3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0]
JIT Compiler: unsupported
Number of long_comp per second: 56.83 ± 0.92

$ $(uv python find 3.14) bench_loops_sum.py
3.14.0 (main, Oct 10 2025, 12:47:49) [Clang 20.1.4 ]
JIT Compiler: disabled
Number of long_comp per second: 72.63 ± 2.53

$ PYTHON_JIT=1 $(uv python find 3.14) bench_loops_sum.py
3.14.0 (main, Oct 10 2025, 12:47:49) [Clang 20.1.4 ]
JIT Compiler: enabled ✨
Number of long_comp per second: 60.55 ± 2.23

CPython is still very slow

PyPy

$ $(uv python find pypy) bench_loops_sum.py
3.11.11 (0253c85bf5f8, Feb 26 2025, 10:42:42)
[PyPy 7.3.19 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
JIT Compiler: enabled ✨
Number of long_comp per second: 919.85 ± 170.55

GraalPy

$ $(uv python find graalpy) bench_loops_sum.py
3.11.7 (Wed Apr 02 19:57:13 UTC 2025)
[Graal, Oracle GraalVM, Java 24.0.1 (amd64)]
JIT Compiler: enabled ✨
Number of long_comp per second: 1017.03 ± 21.86

CPython is still very slow

Speedup versus CPython 3.11

without JIT with JIT
CPy 3.9 0.74
CPy 3.11 1.00
CPy 3.14 1.28 1.07
PyPy 16.2
GraalPy 17.9

Python slowness: standard responses

  • Extensions using the Python C API

    • Other native languages (C, Cython, Rust, …)
    • AOT Python compilers (Pythran, Cython, …)
  • Performance oriented Python implementations based on JIT

Warning

Incompatible strategies

  • JIT Python compilers (Numba, …)

CPython C API

  • Interacts with the Python interpreter from C code

  • A cause of the great Python success

  • Used nearly everywhere

  • Several historical issues

  • C API Working Group: several improvements

Focus for this presentation

Incompatibility with performance oriented Python implementations

Performance oriented Python implementations

Python implementations using JIT…

  • PyPy
  • GraalPy
  • CPython (recent and not mature)

Focus on full Python implementations

Nothing on Numba (Python-NumPy method-based JIT compiler based on LLVM)

JITs in Python implementations

Tracing JITs

  • detection of hot loops
  • creation, optimization and compilation of “traces”

Only Python JITs

Bad interaction with extensions

JITs in Python implementations

PyPy and GraalPy: meta-JIT

  • Interpreter written with a JIT framework
Framework language
PyPy RPython Python 2.7
GraalPy Truffle/GraalVM Java
  • JIT “for free”

JITs in Python implementations

CPython

  • traces of micro-ops

  • Copy & Patch method to produce the machine code

  • not mature yet

  • fundamentally more limited than meta JITs

    Several Python functions written in C

Only a JIT is not sufficient!

Performance strongly depends on the interpreter implementation.

  • Garbage collection: moving GC versus ref counting

  • Internal representation of objects

    For example, [1, 2, 3] represented in PyPy by an array of native integers.

Results mature Python JITs

  • PyPerformance typically x4 faster

  • On many cases, typically x20

  • Zero-cost abstraction + small objects

  • Strong incompatibility with the ecosystem based on native extensions

    • theory (usually, Python JIT cannot see inside extensions)
    • Python C API
  • Not magic

Bad situation for perf oriented Python implementations

  • cursed (many dead projects)
  • too small usage
  • no/less funding
  • the language makes is difficult
  • the current C API makes it impossible

Warning

  • PyPy is clearly in trouble
  • CPython JIT: Faster CPython no longer funded by Microsoft

Python C API not adapted for perf oriented Python VM

  • Strong and wrong hypotheses about the implementations

    • No opaque PyObject: direct access to the structure

    • Reference counting (Py_INCREF, Py_DECREF)

    • *PyObject incompatible with moving GC

  • Functions defined in extensions not typed!

  • Useless boxing/unboxing

arr = np.ones(100)
result = 0
for i in range(len(arr)):
  result += arr[i]

For arr[i]

native int -> Py int -> C int -> C float -> Py float -> native float

Possible solutions

  • HPy
  • Better limited API and stable ABI
  • New Python Native Interface (PyNI)

HPy

  • By PyPy and GraalPy devs

  • A new C API

    • modernized: avoid many “small” issues
    • context argument
    • references (handles) instead of *PyObject
    • HPy_Dup/HPy_Close instead of Py_INCREF/Py_DECREF

    Note

    Compatible with moving garbage collection

HPy: One API for two target ABIs

  • standard ABI (example cpy311-cpy311)

  • universal ABI

    • compatible with \(\neq\) Python versions and implementations
    • debug mode without recompilation

HPy difficulties

  • no CPython core devs
  • no more funding
  • global incertitude about where the C API goes
  • no adoption by big projects
  • NumPy 1 ported but … NumPy 2

Stalled :-(

Future of limited API and stable ABI

  • Limited API: a subset of the C API

  • Stable ABI

    • compatible with \(\neq\) Python versions
    • for example cp312-abi3 compatible with Python >=3.12 (so >=3.15)
  • Evolution for cp315-abi3 (Python 3.15)

    • 2 incompatible build modes for CPython

      (Free-threading and GIL-enabled)

    • Extensions compatible with both modes (PyObject fully opaque)

Python Native Interface

Old plan

  • Petr’s idea (CPython code dev, C API WG)

  • fully new, clean and complete CPython API

  • “native” (different languages, in particular C and Rust)

  • not focused on human usability (for tools like Cython)

  • C API still supported

Recent plan (after the C API Summit, EuroPython 2025)

Same as old, plus

  • strongly inspired by HPy (context argument, handles, …)

  • universal ABI

PyNI: interesting differences with HPy

  • driven by CPython needs

    (in particular C API WG and Faster CPython)

  • CPython, PyPy and GraalPy devs together

  • official (PEPs)

    • natively supported by CPython
    • much clearer for package maintainers

PyNI: possible steps (with funding!)

  • 2025

  • 2026

    • First PyNI in CPython (3.15?)
  • 2027

    • Cython, PyO3, pybind, Pythran, …
    • PyPy, GraalPy support PyNI universal ABI
    • NumPy migration -> Universal NumPy wheels
  • 2028

    • “NumPy Native Interface” (NumPyNI)
    • Cython/… using NumPyNI
  • 2029

    • Universal wheels for most scientific/data ecosystem

Funding and supports

Python very badly funded!

  • Especially for such long-term ecosystem-wide projects

  • Companies and public sector

  • Reasonable investments & not so expensive!

  • Lack of mechanisms to allow one to support Py & co

    (think research projects, CNRS, CEA, universities, …)

  • Organization problem, community problem

    • We should not depend only on Meta, Microsoft, Google, …
    • We should have people paid to work on Py & co

Conclusions

  • Current Python C API inhibits perf improvements

  • Good and fast alternative Python implementations

  • A technical solution: Python Native Interface

    • A long and ecosystem-wide project

    • CPython is technically ready

    • Good compatibility extensions & fast Python interpreters

    • A lot of positive effects in few years

      Time, €, CO₂; more Python and better Python, …

  • Funding, support and organization

    CPython, PyPy, Cython, NumPy, …