Open science for fluid dynamics?¶

An example: building open datasets of stratified turbulence with Fluidsim¶

Pierre Augier$^1$, Jason Reneuve$^1$, Vincent Labarre$^2$

  1. Université Grenoble Alpes, CNRS, Grenoble INP, LEGI, Grenoble
  2. Université Côte d'Azur, Observatoire de la Côte d'Azur, CNRS, Laboratoire Lagrange, Nice
               
28 June 2022
$$\require{begingroup}\require{newcommand}$$$$\gdef\vec#1{\boldsymbol{#1}}$$$$\gdef\R{\mathcal{R}}$$$$\gdef\FhR{(F_h,\ \R)}$$

Open science: a very broad concept¶

General philosophy: more transparency, public access, collaborations, reuse, ... (reaction on how research works)

Very different (sometimes independant) actions / practices:

Taken from UNESCO presentation of 17 February 2021 via Wikipedia

Often considered for subjects with high societal impact.

Supported by very different actors for different reasons

Some costs but not so much money

Goals and principles of open software¶

"Libre" (copyleft licenses, GPL )

  • common goods
  • independence
  • less power to big companies

Open-source (permissive licenses, BSD / MIT)

  • efficiency
  • share costs / collective dynamics
  • used by big companies

Implicit union of 2 very different points of view: great technical successes (for ex. Linux, Python, ...)

Still big issues with funding and economic models!

Open science: different goals and principles¶

(https://en.wikipedia.org/wiki/Open_science)

  • common goods (as free software)

  • public access good for research and society

  • science founded by public money so results should be public / "Public money, public code" (Free Software Foundation Europe)

  • efficiency of research processes (using open source methods)

  • better research evaluation

  • "slow science" (publish less, focus on quality and long term)

  • ...

Focus on open software and open data¶

FAIR principles (for data)¶

Corresponding concept for software: FAIR4SW

An example: open datasets of stratified turbulence¶

Flows influenced by a stable density stratification

Why interesting?

  • Explanation statistical measurements in atmosphere and oceans

  • Modeling / measure of mixing

Summary

  • Internal gravity waves (IGW) and "vortices"

  • Horizontal Froude number $F_h$ and buoyancy Reynolds number $\R = Re {F_h}^2$

  • Garrett and Munk spectra explained as IGW

  • LAST regime $F_h<0.02$, $\R>20$: downscale energy cascade and anisotropic spectra

$$b(x,z)$$
Brethouwer, Billant, Chomaz & Lindborg (2007)

An example: open datasets of stratified turbulence¶

Flows influenced by a stable density stratification

Questions

  • Regimes in $(F_h, \R)$ space?
  • Mixing?
  • Effect of forcing, universality?
  • Role of IGW? Nonlinearity?

Would be good to have

"Foundational" open datasets

  • Different forcing schemes

  • Span the $\FhR$ space

  • Later: effects of "weak rotation" $f/ N \simeq 10$?

2 first datasets: (i) forced in vortices and (ii) forced in IGW.

An example: open datasets of stratified turbulence¶

  • Reproductibility: have to be produced with open source software.

  • Reusable: auto-documented standard file formats.

  • Reusable: software easy to extend.

  • Easily understandable/usable: not only raw states. Needs "outputs", intermediate data for advanced plots. Better integrated within the software: open software to read, load and represent the outputs.

Open source software in research¶

Generalist open source software used for research (for ex. Linux, ...)¶

Open source methods, tools and good practices for research software¶

  • Languages, ecosystems

  • Dev utilities (IDE, formatting, linting, ...)

  • Version control (Git, Mercurial, ...)

  • Web documentation from the code

  • Testing

  • Web platforms (issues, code review, communications, coverage, ...)

  • Continuous Integration (for testing, doc, building, distribution, ...)

$\Rightarrow$ Dramatically improve sustainability, maintainability, efficiency in dev process, usability, ...

$\Rightarrow$ Allow collaborative open source research software

On the need of training and RSE¶

Adopting open source methods requires work, time and skills.

Training is important but it should not be a work done only by researchers.

RSE: Research Software Engineers¶

  • people who combine software expertise with an understanding of research.

  • can help a lot (see https://www.nature.com/articles/d41586-022-01516-2)

My 2 cents: A group of developers with research/lab experiences in universities / departments. With strong contact with the researchers. Enough persons to be available and be able to really do things. Better: not short term contracts.

Community collaborative research software¶

Example of Astropy¶

  • $>$ 420 contributors (34 with more than 100 commits)
  • $>$ 3000 stars on Github
  • $>$ 20000 downloads of conda-forge/astropy 5.1 in 1 month

$\Rightarrow$ strong benefit of collective dynamics!

But depends a lot of the scientific community. Nothing like that in fluid mechanics...

Back on strat. turb: the FluidDyn project and Fluidsim¶

FluidDyn: a project to foster open source and Python in fluid mechanics. A set of open source collaborative Python packages: fluidsim, fluidimage, fluidlab, ...

Fluidsim: A nice piece of technology oriented towards the users/developers

  1. A framework to create CFD solvers from whatever (for ex. Snek5000)

  2. Pseudo-spectral Fourier solvers (ns2d, ns3d, ns3d.strat, sw1l, ...).

  • documented https://fluidsim.readthedocs.io
  • tested
  • dev hosted at https://foss.heptapod.net/fluiddyn/fluidsim
  • very efficient

Small example of the user interface...

Open data: sharing data produced during research¶

Specific licenses for open data

Need to link the dataset with specific software versions

$\Rightarrow$ DOIs for the dataset and for the software version.

Databases: web repositories for datasets, software versions and publications

  • Generalist: (CERN),

  • Specialist: Dataterra (Aeris for Atmos., Odatis/Seanoe for the Oceans, ...), Turbase (European, no more funding), Johns Hopkins Turbulence Database

  • Software: Software Heritage
  • Publications: (CNRS)

Cost of storing and sharing, size limitation

  • Moderate size datasets (< 50 GB)
  • Big datasets (few TB), i.e. datasets containing raw states for turbulence

Back to our example on stratified turbulence¶

For each forcing, 2 datasets:

  • A moderate size dataset (< 50 GB) for "basic" plots

    • Now on a Dropbox like service (Data in Mycore)

    • Before publication of the first paper on Zenodo

  • A big dataset (few TB, raw states) for further investigations and restarts

    Where???

Summary on strat. turb. forced in vortices¶

More than 40 simulations available as open datasets!

The "small" dataset is very convenient to create figures and work on publications (demo)

Fine characterization of the 5 regimes (Viscosity affected, LAST, Optimal, Weakly strat., Passive scalar) with in particular spatial, temporal and spatiotemporal spectra and spectral energy budget.

Foundational datasets (comparison other forcing, restarts for new outputs, better resolution, ...)

Conclusions on Open Science in fluid mechanics¶

  • Open science is a very broad concept. Many different actions/goals/supporters

Advantages (reproductibility, extensibility, ...) and drawbacks (costs, work, time). OS should be use reasonably. Not everything should be open!

  • Costs of "close science" and bad practices in research software

  • Open data and open software strongly coupled

  • Research Software Engineers (RSE) very important for science in future

  • Usage of open datasets?

  • More open source collaborative software in fluid mechanics?

Links¶

Research institutions¶

  • https://www.science-ouverte.cnrs.fr/

  • https://www.ouvrirlascience.fr

RSE (Research Software Engineers)¶

  • https://researchsoftware.org/

  • https://society-rse.org

  • https://www.nature.com/articles/d41586-022-01516-2

  • https://ec.europa.eu/info/sites/default/files/research_and_innovation/importance_of_software_in_research.pdf