When? 27 September 2021
Zarr is a cloud-friendly data format implemented in Python that stores chunked, compressed N-dimensional arrays. Zarr is a young project but already very popular, especially in parallel computing and cloud storage contexts. A key feature of a zarr file is that the data arrays are divided into chunks (pieces) and each chunk is compressed. The optimal chunk shape depends on how one will access the data and the performance can vary greatly if chunks are chosen differently. Therefore, choosing the correct chunking for the data is the essential decision to create the “best” zarr possible.
The wind industry is undoubtedly moving towards wind resource assessment strategies that involve more and more data. Many of the popular industry formats are not able to adequately capture 4D datasets, and there is a need to explore formats that both optimize storage space and data usability, especially, formats that are optimized for cloud storage and big data processing. The aim of this presentation is to encourage the use of zarr files by providing basic guidelines for chunking, since we envision the zarr format as the new standard for dealing with BLOCKS 4D datasets in the industry.