Motivation
This package allows users to read a wide variety of .npy
and .npz files in R. These file formats are commonly used
in Python for storing NumPy arrays and compressed archives of arrays,
respectively. By providing a convenient interface for reading these
files, grumpy enables R users to easily access and work
with data that has been saved in these formats, facilitating
interoperability between Python and R.
We envision users may want to perform some steps of their data analysis in Python and others in R.
It is thus important to be able to read and write files in both languages.
Note however we would usually push users towards more advanced and
performant formats such as Zarr for large datasets. Zarr datasets are
supported for example by the {Rarr} Bioconductor
package.
Using grumpy
Most users are expected to mostly want to use
grumpy::read_npy() and grumpy::read_npz() to
read .npy and .npz files, respectively. These
functions will return R objects that are equivalent to the original
NumPy arrays, allowing users to easily manipulate and analyze the data
in R.
read_npy(system.file("extdata", "test_2d.npy", package = "grumpy"))
#> [,1] [,2] [,3] [,4]
#> [1,] 0 3 6 9
#> [2,] 1 4 7 10
#> [3,] 2 5 8 11Structured datatypes
One notable example are structured datatypes, where each element of
the array is a record with named fields. To keep the output consistent
and as conceptually close as possible to the original NumPy array,
grumpy returns a list of list, with a dim()
attribute to preserve the original shape of the array.
It behaves like a standard R array, but each element is a list of the fields of the original structured datatype.
Note that in many cases, this is not efficient for any downstream
analysis, and users may want to convert the output to a more standard R
data structure such as a data.frame or
data.table for easier manipulation.