flowchart LR
A["Step 1 (Python)"] -->|Pass data as Zarr or HDF5| B["Step 2 (R)"]
Case study of the rhdf5 and Rarr Bioconductor packages
February 25, 2026
flowchart LR
A["Step 1 (Python)"] -->|Pass data as Zarr or HDF5| B["Step 2 (R)"]
Zarr is “cloud-native”, and easily parallelizable.
For both Zarr and HDF5, chunks of the file can be accessed.
BUT if the file is stored remote, it needs to be downloaded first for HDF5, while Zarr can be fetch only the relevant chunk.
/tmp/Rtmp01ndDB/file3f96509e6ae5.zarr
├── 0
│ ├── 0
│ ├── 1
│ └── 2
├── 1
│ ├── 0
│ ├── 1
│ └── 2
└── 2
├── 0
├── 1
└── 2
Zarr is community-driven:


Wrapping and vendoring the HDF5 C library.
Lots of C code, but also lots of thin wrappers handling R memory management and R/C data type conversions.
Native implementation of the Zarr spec in R.

Most prep steps and housekeeping is done in R. Only performance critical steps are in C.
These steps should eventually also run in parallel or on GPU.
Zarr specification is still “python-biased” and includes many “numpysms”.
Better since version 3
Shared problems:
Pros wrapping:
Faster to get an initial proof of concept.
No need to start from scratch.
Zarr conformance tests:
Low diversity of test datasets.
Artür discovered a lot of edge cases and missing features when implementing Zarr support in anndataR.
Cons vendoring (1/3):
* checking compiled code ... WARNING Note: information on .o files is not available File ‘/home/biocbuild/bbs-3.22-bioc/R/site-library/rhdf5/libs/rhdf5.so’: Found ‘__sprintf_chk’, possibly from ‘sprintf’ (C) Found ‘abort’, possibly from ‘abort’ (C) Found ‘rand_r’, possibly from ‘rand_r’ (C) Found ‘stderr’, possibly from ‘stderr’ (C) Found ‘stdout’, possibly from ‘stdout’ (C) Compiled code should not call entry points which might terminate R nor write to stdout/stderr instead of to the console, nor use Fortran I/O nor system RNGs nor [v]sprintf. The detected symbols are linked into the code but might come from libraries and not actually be called.
Cons vendoring (2/3):
Cons vendoring (3/3):
By vendoring, we can make sure all users have the same version.
vs wrapping a system library:
Convince libraries to stick more closely to semantic versioning
Now the case in HDF5 2.0.0!

Hugo Gruson; Huber Group Lab Meeting 02/2026