Python numerical ecosystem: deep technical issues soon fixed through the HPy project?

I gave a presentation focussed about HPy and the technical underpinnings of the numerical Python ecosystem.

The slides are available here .

Diagram Python CPython C API issues

Dynamic diagram showing 2 vicious circles circle related to the CPython C API. CPython is the reference Python implementation. Alternative Python implementations are for example PyPy and GraalPy. The last image shows the positive effect of HPy.

Abstract

Despite its massive usage for numerical computation, the Python ecosystem is still largely dominated by CPython, the reference implementation known to be quite slow. We will show how problems in the CPython C API feed vicious circles blocking on the one side, CPython improvements and on the other side, usage and development of alternative Python implementations (like PyPy and GraalPy).

The HPy project aims at (i) designing a new C API for Python and (ii) making possible a smooth transition from the CPython C API to the new HPy API. We will discuss how this ambitious project could have a great positive impact on the numerical Python ecosystem and its users. We will present the current state of the project in 2023 and finally, how you can help HPy and make this dream a reality as soon as possible.

Personal thoughts about HPy

After working on this presentation, I learned enough to get some personal opinions on the HPy project. I felt the need to write them here to be able to share. First, let me tell from where I talk. I can define myself as a advanced Python user specialized in scientific computing. I maintain few packages useful in my field of research (turbulence and instabilities) and teach Python to students (Master and PhD) and university staff. I also wrote Transonic , which is a nice tool based on Pythran to accelerate Numpy/Python codes. I am not personally involved in HPy, even if I try to help as I can.

I think that a successful HPy would strongly benefit users of the scientific Python ecosystem. I see that HPy is slowly maturing and will soon reach an important milestone (HPy 0.9) . However, the future of HPy is still uncertain. There is no global acceptance that HPy is the way to go for the ecosystem and the project has few supporters, also in terms of funding.

The Python community should soon see HPy as a fundamental building block for the near future of Python

HPy is a very ambitious project. The proposal is to convert most important packages to this new API. The benefice for the users would be huge and it would completely change the ecosystem (from CPython hyper dominant to a much more balanced state where different interpreters coexist).

Currently (in 2023), I think HPy is still considered by many people interested in the subject as just an interesting long term experiment . I think it is important that soon “the community” starts to consider it as a fundamental building block of the near future of Python . Getting the transition to HPy done in a reasonable time requires such change of perspective. The amount of work is large enough so that we need a collective effort.

It seems to me that it starts to be accepted that deep changes are needed to fix some aspects of the legacy CPython C API. HPy is one of the proposed strategies. On the CPython side, there are other proposals. Victor Stinner is a CPython core developer (paid by Red Hat to work on CPython) who worked a lot in improving the CPython C API. He supports the HPy project as a long term solution but he proposed to also improve the CPython C API and the “limited API” .

Recently, Mark Shannon (A CPython core developer paid by Microsoft to work on the Faster CPython project) launched another project to design a new C API for Python . Compared to HPy, this project seems much more centered on CPython, since one of its goals is to use this new API inside CPython. Mark’s proposal takes some ideas from HPy (for example all functions take an opaque context as first argument, and Python objects are referred through opaque references, similar to the HPy handles) but also differ in some points.

It is worth noting that HPy is a lot more mature that alternative proposals. Some complex real world packages have been ported to HPy (for example Matplotlib) and already run well with PyPy, GraalPy and CPython. HPy will soon release its first stable version ( Milestone ABI version 1 ), which will be sufficient to port the Numpy extension providing the np.ndarray type ( numpy.core._multiarray_umath ) so that a Game of life benchmark can be run.

How the Python community can adopt and support HPy?

I may be bias but it seems to me that this is the scientific/data Python community that would benefit more from a successful HPy. Being able to run our programs with PyPy and GraalPy would clearly increase what can be done in Python. So I think HPy should first focus on the most popular packages for this community, namely Numpy, Matplotlib, Pandas and Scipy.

I imagine that some CPython core developers and potentially the Python Steering Council could have a tendency to think that it is not so important and urgent to foster a project of making Python a language with different first class citizen implementations specialized in different tasks, so it is good for HPy not to depend too much on their support.

Therefore, the HPy project should first try to be adopted by the Numpy/Scipy community. The first step is to have a Numpy fork based on a stable HPy release that demonstrates that the promises of HPy are serious. From what I see following the HPy repository, HPy should soon reach this very important milestone. We should then have few benchmarks showing that Numpy written in HPy would really be (i) as efficient in CPython that without HPy and (ii) at least as efficient with PyPy and GraalPy than with CPython. We should show some examples mixing pure Python and Numpy for which using PyPy and GraalPy is much more efficient that with CPython.

At this point, we should communicate (more on communication later) with the goal to gradually get the Python community adopt and support HPy. There will be different important ways to support HPy (more on funding later).

Then, HPy will have to demonstrate that a new Numpy C API compatible with HPy can be designed and implemented. A good demonstrator will be the HPy Matplotlib fork fully ported to HPy, i.e. using the new Numpy HPy C API.

With a bit more demonstration that HPy could be good enough for Cython (with a minimal Cython backend) and some more work on packaging, the HPy project would be ready for being used in the main Numpy and Matplotlib repositories. Adoption of HPy by Numpy and Matplotlib would of course be great signals for other projects, a bit similarly to the recent adoption of Pythran by Scipy and Scikit-image.

Note that all this could be done without explicit support of CPython core developers and the Python Steering Council. However, such support could ease and accelerate the process.

Note that the issue of the time is important. When will end-users be able to install with pip an universal wheel of Numpy, usable on different CPython versions and efficient for alternative Python implementations? For the users (and for me), the exact technology used (HPy or another solution) does not matter, but it makes a big difference if the answer is 1, 2 or 5 years. This is IMO a good argument for HPy, which already exists and is not far from being ready to be used in production.

Communication about HPy

Currently, the external communication on HPy is mostly done through the project README , the project website/blog and the project documentation .

Most of these documents target people working with the CPython C API and this technical documentation is IMO remarkably clear and educational. Unfortunately, HPy communication is less good to motivate maintainers of projets and end-users to support the project. There is a very interesting page HPy overview which nicely presents the motivation and goals. However, some high level points of view are IMHO missing:

  • the general picture (deep technical issues of our ecosystem, which block the use and the development of alternative Python implementations)

  • the targeted new state for the Python ecosystem (and the direct consequences for users)

  • the proposed big transition towards HPy of nearly the whole Python ecosystem.

There are also very few mentions of HPy in Python conferences and social media. Victor Stinner mentions the project in some of his talks, for example Python Performance: Past, Present and Future at EuroPython 2019 and Introducing incompatible changes in Python at PyCon US 2023. However, there won’t be any talks centered on HPy at PyCon 2023 (except in the Language Summit ).

It is interesting to compare the respective communications of HPy and of the Faster CPython project . The Faster CPython project presents very high ambitions (X5 speedup for CPython in few years) and explains its detailed plan through many channels, for example a talk by Mark Shannon at PyCon 2023 . In contrast, HPy is very shy and conservative. It is repeated in the documentation that “HPy is still in the early stages of development” and that “there is still a long road before HPy is usable for the general public”. This is partially true, but this is not very positive and motivating. The concrete consequences for the end-users are somehow hidden behind quite technical data.

However, with all the respect that I have for the people working on the Faster CPython project, a successful HPy project (i.e. most popular packages with extensions using HPy) would lead to a deeper and better improvement to the Python ecosystem. In particular, it would bring great speedup for several users, which will be able to use Python implementations really “5 times faster” than Python 3.10 for many real world applications. Having specialized interpreters for different tasks would be much better than one interpreter that still has to support its problematic legacy C API during years. Without improvements of the CPython C API and with the constrain of not degrading performance of extensions using the legacy C API, we start to know that the target “x5 faster” of the Faster CPython project is very ambitious. Thus, a successful HPy would actually help a lot the Faster CPython project.

Increase HPy funding?

The Faster CPython project is funded by Microsoft with a group of very high level developers working on it. This is great! Fortunately, HPy is also supported by a company (Oracle) as part of the Graal project.

However, given the ambition of the project and the potential positive impact for the users, the overall support of the community is still very small. It’s pretty clear that to succeed and produce practical results for the end-users in few years, HPy needs more work force.

There is an Opencollective project so that it is possible to give money for HPy. However, without a clear roadmap, targets in term of how much is needed, explanation of how this money could be used and efficient fundraising campaigns, it cannot be efficient. Given the potential of the project, I’m sure that some Python users could be motivated to give a bit to the project and some companies would like to become sponsor and give money or a bit of work force. I guess that it would be possible to get enough for example to pay a developer to work full time on HPy and the HPy transition during 2 years.

It seems to me that given the potential impact of the project, we should first think about what is needed for the overall project and how much it would cost. Then, we will see how we can get this money. With good communication, HPy can get support from thousands of donors (individuals, scientific projects, companies, etc.). I guess getting “a lot of money” (actually quite reasonable amounts) and work force for the HPy project should not be an issue if competent people are involved to organize this collective funding.

Potential role of CPython and Python Steering Council

The Numpy port based on a stable version of HPy will be a clear demonstration that HPy should not be considered only as a long term interesting experiment. HPy could change Python user experience of millions of end-users in few years.

I wrote that the HPy project does not directly depend on the support of CPython core developers and the Python Steering Council. However, some clear messages could be sent to help the projet. Moreover, I guess few decisions about CPython could help HPy.

For example, currently CPython cannot import an universal HPy extension without a dummy Python file. This workaround works but is quite inconvenient, since (i) it is not great to add in a Python package automatically generated Python files (that should not be tracked by version control) and (ii) it is incompatible with tools which produce extensions from a Python file (namely Cython in pure Python mode and Pythran) . It could be reasonable that at least CPython 3.12 would be able to recognize an universal wheel and to know that it needs to load it with the package hpy.universal . I guess that such small CPython adaptations for HPy would require a PEP, but it could be reasonable given the potential positive impact of the project for the Python langage, ecosystem and community.

It seems to me that an explicit support by the CPython project, the Python Steering Council and/or the Python Software Foundation is not mandatory for HPy but could greatly speedup the transition towards a globally better Python ecosystem allowed by HPy.