Skip to content

Conversation

@jo-mueller
Copy link
Contributor

@jo-mueller jo-mueller commented Aug 21, 2025

Dear all,

This PR is intended as a fix for #322 and #309. Some confusion was raised in these issues as to whether different version keys would be allowed in an ome-zarrs top-level zarr.json and in a following plate/well/labels groups therein.

In the current examples on the 0.5 branch (e.g., plate example), the version key is only given at the top level and not in the group-level. This is in contradiction to the current text of the plate spec, which states:

The plate dictionary MUST contain a version key whose value MUST be a string specifying the version of the plate specification.

Same for wells and labels groups examples.

Context and consequences

For context, the version key was originally moved up into the top-level ome group under the pretext of discouraging different metadata versions inside the same file according to RFC2:

While technically possible, OME-Zarr 0.5 (with Zarr v3) and OME-Zarr 0.4 (with Zarr v2) metadata could exist side-by-side in a Zarr hierarchy, it is NOT RECOMMENDED. This may be useful for short periods of time (i.e. during migrations from 0.4 to 0.5), but should not be used longer term. Multiple metadata versions can lead to conflicts, which may be hard to resolve by implementations. If implementations encounter 0.4 and 0.5 metadata side-by-side, 0.5 SHOULD be treated preferentially.

Removing the statements about the version keys in the plate/well/labels groups essentially drops the idea differing versions inside a single ome.zarr file, which I think is a reasonable step from an implementation perspective. Also, it brings the spec back into sync with the examples, which I think is important for the integrity of this project.

I modified the statement at the top-level ome group into the following to make this clear:

The OME-Zarr Metadata is stored in the various zarr.json files throughout the above array
hierarchy. The OME-Zarr Metadata version MUST be consistent within a hierarchy. The top-level ome
group MUST contain a version key whose value MUST be a string specifying the version of the
metadata specification.

Let me know what you think :)

cc @joshmoore @normanrz @dstansby @ziw-liu

PS: Hope I got this PR configured right - had a bit of trouble getting the knack about contributing with submodules (#323)

Copy link
Member

@will-moore will-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thx. 👍

@jo-mueller
Copy link
Contributor Author

@will-moore is this the correct target branch for this fix? Just wondering about the failing checks.

@will-moore
Copy link
Member

@jo-mueller I'm not up-to-date with the strategy for PRs, or even know if this should be opened against v0.5 (which would create a v0.5.n+1 version of the spec) or if we just go for v0.6?
The discussion at #276 seems to have some pointers... I'll see what I can understand from that...

@ziw-liu
Copy link
Contributor

ziw-liu commented Aug 22, 2025

For existing v0.5 datasets that have version keys inside plate and well objects, will this change make them invalid OME-Zarr files?

@jo-mueller
Copy link
Contributor Author

@ziw-liu THAT is a good question. I think it's safer to just commit this into a new 0.6-dev branch. For the 0.5 branch, it might be better to simply update the examples therein and make sure that everthing is properly synced up before 0.6 comes out.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 23, 2025

Generally I don't think there's language in the spec that prevents extra keys in the various JSON objects -- my understanding is that the spec generally describes JSON objects in positive terms (e.g., "object "foo" MUST have key "bar"), and doesn't say anything about keys an object "foo" MUST NOT have. The main effect of this change would be on any implementations that implemented the actual spec as written -- they would need to change their behavior.

@ziw-liu
Copy link
Contributor

ziw-liu commented Aug 24, 2025

Generally I don't think there's language in the spec that prevents extra keys in the various JSON objects

I agree with this observation. However in practice various implementations do prohibit extra keys. Maybe a separate clarification is needed, e.g. ome objects MAY contain keys not described in this specification.

@LDeakin
Copy link

LDeakin commented Aug 24, 2025

Yes, further clarification on the presence of extra keys would be helpful. This came up yesterday here zarrs/ome_zarr_metadata#9.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 24, 2025

I would recommend adding language like this at the top of the spec:

Extra keys in JSON objects

Unless otherwise stated, a JSON object defined in this document MAY contain additional keys beyond those enumerated in its definition. OME-Zarr Implementations MUST allow extra keys in such JSON objects. This means that, when reading OME-Zarr data, the presence of an extra key in such a JSON object MUST NOT be treated as an error.

Whether an implementation ignores the extra keys or propagates them is up to that particular implementation. It is recommended that implementations clearly declare how they handle extra keys, with a particular emphasis on informing users in the context of irreversible operations. For example, an implementation that resaves OME-Zarr data and ignores extra keys could implement a routine that warns users when they attempt to resave OME-Zarr data with extra keys.

jo-mueller and others added 2 commits August 25, 2025 09:57
@jo-mueller
Copy link
Contributor Author

jo-mueller commented Aug 25, 2025

@d-v-b I added the proposed statement under the document conventions section. 👍

Co-Authored-By: David Stansby <d.stansby@ucl.ac.uk>
@dstansby
Copy link
Contributor

It's worth noting that if arbitrary extra keys are allowed, then this removes any future possibility of backwards compatibility between versions of the specification in future.

To give a concrete example, say someone writes data that contains a color: "blue" field somewhere in the "ome" metadata in a hypothetical version 1.2 of OME-Zarr, and then version 1.3 of OME-Zarr introduces a specifciation for the color field that says the metdata MAY have a "color" field and it MUST be a color hex string. The previous color: "blue" metadata would now be invalid. This means OME-Zarr 1.2 data could not generally also be considered valid OME-Zarr 1.3 data.

If instead arbitrary extra keys are prohibited in the "ome" metadadata, it allows the possibility of all valid metadata written with one version of the specification to also be valid with the next version.

Given OME-Zarr has reserved the top-level "ome" key in the JSON attributes, I actually now think that extra keys should be prohibited to give the option of backwards compatibility. Instead, extra keys should live outside the "ome" field.

This would also limit proliferation of "unofficial" metadata under the "ome" field, keeping the metadata there tightly controlled to what is allowed within the specifciation.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

This is a good point! In terms of what is right for the future of the spec, I don't have strong feelings about whether extra keys should be allowed or not. I can definitely see the appeal of keeping the "ome" namespace tight. Users have the rest of the attributes namespace for their own stuff. On the other hand, ome-zarr might want to define some extensibiltiy semantics, in which case maybe extra keys could be allowed, as long as they can be differentiated from "official" keys (e.g., via a "omezarr." prefix or something. but that could get tedious.

But in terms of interpreting what the spec says today, I think disallowing extra keys would be potentially breaking for anyone who saved data with an extra key because they thought it was allowed.

@normanrz
Copy link
Contributor

I agree with @dstansby here that extra keys should not be allowed for 1.x versions. I don't see much harm for 0.x versions, but the question is whether we want to promote it now, if we'll disallow it later. Perhaps, we should just carve out an exception for the "version" field within the 0.5 versions.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

see also: #209, maybe we could get a decision here and close that issue :)

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

cc @thewtex since ngff-zarr saves an extra "@type" key to 0.5 multiscales metadata

@jo-mueller
Copy link
Contributor Author

Thanks all for the insight and contributions to the discussion! From what I have gathered above, an agreeable solution would be the following:

  • Define "protected" keys in the json file, such as the "ome" key.
  • In these protected fields, additional metadata MAY be stored transitionally. Implementations MUST NOT throw errors upon encountering these
  • Starting from version x.x, (1.x?) additional metadata MUST NOT be stored in protected fields anymore. Implementations SHOULD consider encountered foreign keys as (warnings/errors)

That should get everyone plenty of time to adapt. Maybe it will also kickstart the discussion upon plug-in specifications (tables, spatialdata, etc) under protected keys outside of the "ome" key.

@clbarnes
Copy link
Contributor

Neither banning nor allowing extra keys solves backwards compatibility in the general case, they just have different trade-offs. Banning additional keys breaks backwards compatibility when a key is removed from the spec (e.g. version per @ziw-liu 's comment), but allowing additional keys breaks backwards compatibility when a new key is included which someone else had already arbitarily included in their metadata (per @dstansby 's comment).

Arguably, as the spec evolves we're more likely to add keys than remove them (i.e. we should ban extra keys). However, banning additional keys causes backwards compatibility problems right now, where the problem it solves would only occur if A) we removed a field from the spec in one version, and then added it back in a future version, or B) someone is already adding arbitrary keys to their OME metadata (i.e. using it incorrectly).

IMO, the best solution is something like

Implementations SHOULD NOT raise errors (but MAY raise warnings) when additional fields are present, to allow backwards compatibility if those fields are removed in future versions of the OME-Zarr spec.

and

Users MUST NOT add arbitrary fields inside the ome object or sub-objects, to allow backwards compatibility if fields of the same name are added in future versions of the OME-Zarr spec.

Implementations SHOULD prevent users from adding such arbitrary fields where possible.

and then spec designers try to avoid removing and then re-adding a field of the same name.

Nobody can stop a JS or python user from doing something stupid if they really try, but this at least sets expectations and guides implementations with better type systems.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 27, 2025

Users MUST NOT add arbitrary fields inside the ome object or sub-objects, to allow backwards compatibility if fields of the same name are added in future versions of the OME-Zarr spec.

Implementations SHOULD prevent users from adding such arbitrary fields where possible.

This requires language permitting lower-level objects to relax this rule as needed, since the spec currently recommends(!) saving an unstructured dict in the multiscales metadata:

It SHOULD contain the field "metadata", which contains a dictionary with additional information about the downscaling method.

Nobody can stop a JS or python user from doing something stupid if they really try, but this at least sets expectations and guides implementations with better type systems.

IMO JS (via typescript) and python both offer everything we need to completely model this stuff. Whether people use it is a separate question.

@dstansby
Copy link
Contributor

Good point about the downside of banning keys - I agree though that the spec is more likely to add keys than remove them. If keys are removed, MUST contain could be converted to MAY contain statements, which would allow for backwards compatibility.

However, banning additional keys causes backwards compatibility problems right now

Hasn't every version of the spec so far been backwards incompatible with the previous version(s) anyway? (with few (any?) tools available to convert metadata between incompatible versions). In this pre-1.0 stage so far preserving backwards copatibility does not seem to have been a consideration.

Having language that prohibits some a particular metadata structure, but then gives implementations a get out clause to allow data to violate that metadata structure seems like a bad idea to me, and defeats the point of having a specification in the first place.

I like the suggestion in #324 (comment) to ban extra metadata from 1.0 onwards, to allow for backwards compatibility in the future, and in tandem with points in #324 (comment) that would suggest that before 1.0 we could say "additional metadata SHOULD NOT" be stored under the "ome" key" to discourage it, and then change SHOULD NOT to MUST NOT in version 1.0.

Finally, defining or recommending what implementations do is quite a new topic, and so far has not been incorporated into the specification. It is a much more general topic than just the change proposed here, so perhaps discussion about including recommendations for implementations is done elsewhere to separate concerns (and keep it easy to discuss one thing at a time)?

@jo-mueller
Copy link
Contributor Author

Maybe it makes sense to continue this discussion in a separate issue? Not to step on anyone's toes but I think the PR got a bit sidetracked from the original purpose, which was aligning spec and examples 😅 This should get some people on their way with regard to whether a version key should be present in plates/wells.

How restrictive/permissive the spec should be with respect to extra keys in the json is a matter that requires a bit more careful discussion, I think.

In order to allow this go through, I would remove the paragraph suggested by @d-v-b from the top:

Extra keys in JSON objects

Unless otherwise stated, a JSON object defined in this document MAY contain additional keys
beyond those enumerated in its definition. The presence of an additional key in such a JSON
object MUST NOT be treated as an error by implementations.

Whether an implementation ignores the extra keys or propagates them is up to that particular
implementation. It is recommended that implementations clearly declare how they handle extra
keys, with a particular emphasis on informing users in the context of irreversible operations.
For example, an implementation that resaves OME-Zarr data and ignores extra keys could implement
a routine that warns users when they attempt to resave OME-Zarr data with extra keys.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 28, 2025

How restrictive/permissive the spec should be with respect to extra keys in the json is a matter that requires a bit more careful discussion, I think.

I think there are 2 topics, both of which were covered in this thread:

  1. How can your PR (which removes some keys) avoid putting existing data out of spec.

    I think some text that clarifies the common interpretation (that extra keys are allowed) is necessary here, because @ziw-liu asked what would happen to data already saved with the keys that are no longer described in the spec.

  2. How should future versions of the spec handle extra keys. That's out of scope for this PR and IMO this discussion should continue in extra fields in metadata #209.

@jo-mueller
Copy link
Contributor Author

Thanks for linking the issue! I'll write up a summary of this thread over there.

@jo-mueller jo-mueller requested a review from d-v-b August 29, 2025 12:49
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, provided that everyone considers it obvious that extra keys are allowed for reading. Because this change removes required metadata keys, there will be datasets in the wild that were previously compliant and used these keys. They should continue to be compliant after these changes. That means their extra keys must be accepted by readers.

@dstansby
Copy link
Contributor

dstansby commented Aug 29, 2025

Is there a desire/need to have backwards compatibility between versions of the spec now? As far as I remember every version increment has been backwards incompatible up to now.

@dstansby
Copy link
Contributor

Hold on a minute, this isn't another proposal to silently edit (ie., change without incrementing the version number) 0.5 specification is it? If it is, please please please can we stop doing this, and make changes to the next version instead. Silently editing the specification without incrementing the version defeats the point of having a versioned spec in the first place.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 29, 2025

good point, I wasn't even thinking in terms of versioning... how does that work again for this spec?

@jo-mueller
Copy link
Contributor Author

Presumably (because data written after this change is not compatible with the spec before this change) this

Wouldn't say so. The change just doesn't demand the version to be written in the plate/well/etc structure, but it doesn't invalidate the keys if they are there. With this change passed, the already existing version keys will just be extra keys that shouldn't throw errors if encountered by readers.

But I'm also fine with changing the target of this PR to a new 0.6dev and just leave the inadequacies of older versions in the past. That's probably the whole point of versioning...

@dstansby
Copy link
Contributor

Wouldn't say so. The change just doesn't demand the version to be written in the plate/well/etc structure, but it doesn't invalidate the keys if they are there.

I think you're saying that this change is forwards compatible (all data written before the change is valid after the change), but I'm saying that the change is not backwards compatible (not all data written after the changes is valid with the spec before the change) - in this case, if data is written without a version key, it is not valid 0.5 data (according to the first 'release' of the 0.5 specification)

(maybe I get backwards and forwards compatible the wrong way above - it always confuses me which way round they are)

@jo-mueller
Copy link
Contributor Author

As for versioning, I think the idea now would be to create a new branch (0.6dev) or something of the sort and then collect all changes for a future 0.6 there. For release, the branch should be renamed to 0.6 the branch would be added as a submodule to the main repo. The link under latest should be changed to point to the 0.6 submodule.

Maybe @joshmoore could help us there?

@normanrz
Copy link
Contributor

I think it should go in 0.5.3, because this bug originates in 0.5.

@jo-mueller
Copy link
Contributor Author

Back from vacation - maybe we can all agree that while it should or should not go into a 0.5.3 version, it should definitely be part of an upcoming 0.6? Once there is an active development head here, I'll change the target of the PR to that so we can uncouple this from the question of whether or how older versions should be modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants