Skip to content

zarr-conventions/dependent-arrays

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dependent Arrays

Description

This Zarr convention defines a scheme for expressing a dependency relationship between Zarr arrays. A Zarr array, called a "primary array", declares so-called "dependent arrays" in its metadata document via a field in attributes.

  • The declaration of dependent arrays is via a JSON object where the keys are names of dependent arrays, and the values are array metadata documents that describe those arrays.
  • Dependent arrays are only discoverable by reading the primary array metadata document and inspecting the attributes field.
  • Dependent arrays store their chunks under the same prefix as the primary array.

Configuration

This convention only applies to Zarr arrays.

Dependent arrays are declared by a dependent-arrays key in the attributes field of an array metadata document. The dependent-arrays field is a JSON object. The keys of the dependent-arrays object are the names of dependent arrays. These keys are subject to the same rules defined for array and group names in the Zarr V3 specification. The values of the dependent-arrays object are partial Zarr V3 metadata documents for those arrays. See partial-array-metadata for details on this transformation.

Partial array metadata

It is expected (but not required) that the metadata for dependent arrays will have fields in common with the metadata for the primary array, such as codecs and chunk_grid. This convention defines a representation for array metadata that allows dependent array metadata documents to reuse fields of the parent metadata document without re-declaring them.

The metadata documents for dependent arrays can be declared in partial form. A partial array metadata document is the same as a regular array metadata document, except that any keys which are required for a complete metadata document can be missing from a partial metadata document. For example, the empty object {} is a valid partial array metadata document with every key missing.

To transform a partial array metadata document declared in the dependent-arrays field into a complete metadata document, any missing field other than the attributes field is filled in with the value of the field with the same name from the primary array metadata document. The attributes field is handled specially: if the attributes field is missing from a dependent array metadata document, the value of the attributes field of the primary array is used after removing the dependent-arrays field. This prevents infinitely re-declaring the same dependent array.

Examples

Multiscale arrays

This example array metadata document demonstrates the application of dependent array metadata to declare a collection of multiscale arrays:

{
  "zarr_format": 3,
  "node_type": "array",
  "shape": [128, 128],
  "data_type": "uint8",
  "chunk_grid": {
      "name": "regular", 
      "configuration": {"chunk_shape": [32, 32]}
      },
  "chunk_key_encoding": {
      "name": "prefix"
      "configuration": {
          "prefix": "s0/",
          "base_encoding": "default"
      },
  "codecs": ["bytes", "gzip"],
  "dimension_names": ["a", "b"],
  "attributes": {
      "dependent_arrays": {
          "s1": {
              "shape": [64, 64],
              "chunk_key_encoding": {
                  "name": "suffix",
                  "configuration": {
                      "prefix": "s1/",
                      "base_encoding": "default"
                    }
                }
            },
          "s2": {
              "shape": [32, 32],
              "chunk_key_encoding": {
                  "name": "suffix",
                  "configuration": {
                      "prefix": "s2/",
                      "base_encoding": "default"
                    }
                }
            },                
          }
      }
  }

Note the use of distinct chunk key encodings for each derived array. This example uses the proposed prefix chunk key encoding.

Implementation notes

Chunking

Most usage of dependent arrays will require that the chunks of the primary array and the dependent arrays use different chunk key encoding configurations so that no two chunks have the same key in storage.

Using only the chunk key encodings (default and v2) defined in the Zarr V3 specification, there are only 4 possible sets of non-intersecting sets of chunk keys for arrays with 2 or more dimensions, which means a maximum of 3 dependent arrays can be declared, if each array will be filled entirely. For one- or zero-dimensional arrays, the separator parameter of the default and v2 chunk grids has no effect and this there are only 2 possible sets of chunk keys.

Many more non-intersecting sets would be possible with chunk key encodings that allow a metadata-defined prefix or suffix.

Write synchronization

When dependent arrays are used to model a dependency relationship between one array (the primary array) and arrays derived from that array (the dependent arrays), changes to the primary array likely need to be coordinated with changes to the dependent arrays. In these cases, only tools capable of coordinating these changes should be given mutable access to arrays that declare dependent arrays.

About

A Zarr Convention defining directed relationships between arrays.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from zarr-conventions/template