Pierre Augier$^1$, Jason Reneuve$^1$, Vincent Labarre$^2$
General philosophy: more transparency, public access, collaborations, reuse, ... (reaction on how research works)
Very different (sometimes independant) actions / practices:
Often considered for subjects with high societal impact.
Supported by very different actors for different reasons
Some costs but not so much money
"Libre" (copyleft licenses, GPL )
Open-source (permissive licenses, BSD / MIT)
Implicit union of 2 very different points of view: great technical successes (for ex. Linux, Python, ...)
Still big issues with funding and economic models!
(https://en.wikipedia.org/wiki/Open_science)
common goods (as free software)
public access good for research and society
science founded by public money so results should be public / "Public money, public code" (Free Software Foundation Europe)
efficiency of research processes (using open source methods)
better research evaluation
"slow science" (publish less, focus on quality and long term)
...
Flows influenced by a stable density stratification
Why interesting?
Explanation statistical measurements in atmosphere and oceans
Modeling / measure of mixing
Summary
Internal gravity waves (IGW) and "vortices"
Horizontal Froude number $F_h$ and buoyancy Reynolds number $\R = Re {F_h}^2$
Garrett and Munk spectra explained as IGW
LAST regime $F_h<0.02$, $\R>20$: downscale energy cascade and anisotropic spectra
Flows influenced by a stable density stratification
Questions
Would be good to have
"Foundational" open datasets
Different forcing schemes
Span the $\FhR$ space
Later: effects of "weak rotation" $f/ N \simeq 10$?
2 first datasets: (i) forced in vortices and (ii) forced in IGW.
Reproductibility: have to be produced with open source software.
Reusable: auto-documented standard file formats.
Reusable: software easy to extend.
Easily understandable/usable: not only raw states. Needs "outputs", intermediate data for advanced plots. Better integrated within the software: open software to read, load and represent the outputs.
Languages, ecosystems
Dev utilities (IDE, formatting, linting, ...)
Version control (Git, Mercurial, ...)
Web documentation from the code
Testing
Web platforms (issues, code review, communications, coverage, ...)
Continuous Integration (for testing, doc, building, distribution, ...)
$\Rightarrow$ Dramatically improve sustainability, maintainability, efficiency in dev process, usability, ...
$\Rightarrow$ Allow collaborative open source research software
Adopting open source methods requires work, time and skills.
Training is important but it should not be a work done only by researchers.
people who combine software expertise with an understanding of research.
can help a lot (see https://www.nature.com/articles/d41586-022-01516-2)
My 2 cents: A group of developers with research/lab experiences in universities / departments. With strong contact with the researchers. Enough persons to be available and be able to really do things. Better: not short term contracts.
$\Rightarrow$ strong benefit of collective dynamics!
But depends a lot of the scientific community. Nothing like that in fluid mechanics...
FluidDyn: a project to foster open source and Python in fluid mechanics. A set of open source collaborative Python packages: fluidsim, fluidimage, fluidlab, ...
Fluidsim: A nice piece of technology oriented towards the users/developers
A framework to create CFD solvers from whatever (for ex. Snek5000)
Pseudo-spectral Fourier solvers (ns2d, ns3d, ns3d.strat, sw1l, ...).
Specific licenses for open data
Need to link the dataset with specific software versions
$\Rightarrow$ DOIs for the dataset and for the software version.
Databases: web repositories for datasets, software versions and publications
Specialist: Dataterra (Aeris for Atmos., Odatis/Seanoe for the Oceans, ...), Turbase (European, no more funding), Johns Hopkins Turbulence Database
Cost of storing and sharing, size limitation
For each forcing, 2 datasets:
A moderate size dataset (< 50 GB) for "basic" plots
Now on a Dropbox like service (Data in Mycore)
Before publication of the first paper on Zenodo
A big dataset (few TB, raw states) for further investigations and restarts
Where???
More than 40 simulations available as open datasets!
The "small" dataset is very convenient to create figures and work on publications (demo)
Fine characterization of the 5 regimes (Viscosity affected, LAST, Optimal, Weakly strat., Passive scalar) with in particular spatial, temporal and spatiotemporal spectra and spectral energy budget.
Foundational datasets (comparison other forcing, restarts for new outputs, better resolution, ...)
Advantages (reproductibility, extensibility, ...) and drawbacks (costs, work, time). OS should be use reasonably. Not everything should be open!
Costs of "close science" and bad practices in research software
Open data and open software strongly coupled
Research Software Engineers (RSE) very important for science in future
Usage of open datasets?
More open source collaborative software in fluid mechanics?