Middleware no longer saves datamodels to cache; left to pipeline. #4

astrobc1 · 2025-09-02T16:52:32Z

@zonca This is only a minor change. Currently the CalibrationStore is responsible for calling model.save(), but I would rather have the DRP be responsible for this. This way, the DRP only passes python dict's and ORM objects, both of which it understands. In the HISPEC DRP, this would mean adding something like model.save() to RegisterCalibrationPrimitive between lines 25-26.

This would technically break HISPEC_DRP until I make the corresponding change there. How do you recommend I handle cases like this?

zonca · 2025-09-02T18:32:30Z

is this assuming that the calibration is already in the cache folder? I think the expected interface is I give the calibration store either a fits file or a data model and the store handles copying or saving the file into the right folder automatically. Also, if given a fits file the store should read it, add proper metadata before saving it into the cache.

astrobc1 · 2025-09-02T18:53:41Z

I think the appropriate order is:

Save .fits file to local cache - In most cases, the file might not live anywhere on disk yet (maybe it was just created). Therefore copying a file won't work for most models unless it's already saved elsewhere.
Create ORM object and add to local DB.

I'm not totally opposed to the Store calling model.save(). However since the middleware has no dependence on the DRP and provides no specifications on what a datamodel looks like, I prefer the DRP handles this.

If given only a filename, I would require the FITS file already has all of the necessary metadata populated (at least for HISPEC_DRP) and is correctly formatted otherwise, in which case a file copy is fine.

zonca · 2025-09-02T20:57:14Z

@astrobc1 I added a simple test, please check if it makes sense.

what I am worried about is people sharing their local db with paths that only work on their machines.
I would envision instead a system that has all calibration files as relative paths to the root folder of the calibration store cache

astrobc1 · 2025-09-02T23:50:42Z

Let me know if I'm misunderstanding - The DB itself shouldn't contain local paths (only remote/archival paths). Sharing an entire local cache folder will work since the full path to a calibration file is specified by the environment variable KOA_CALIBRATION_CACHE. Sharing just the sqlite .db file should also work if the actual files also live on the other machine.

There is still a relative path within the cache that needs formed (see here), so there is an assumption regarding how the cache folder is structured (see here).

…bration registration

zonca · 2025-09-04T04:15:58Z

tests/test_store.py

@astrobc1 do you think this test shows a reasonable way of using the calibration store? can you make it more realistic? I think it would help me understand.

Yes, I think test_store.py and test_koa_middleware_integration.py are both great tests and test similar functionality as HISPEC's corresponding test.

Looking closer at how test_store.py calls query_by_id, should it instead return first() instead of all()? I think it's safe to assume "id" is always the primary key.

ok, let's look at test_store.py.

So the user has created their own flat named mock_cal, they want to use it for many pipelines in the future, so they want to save it into the calibration db so it is automatically retrieved given the proper query parameters.

This is what I would expect:

I create an object that has both the data array of my flat and metadata

I guess this should be CalibrationTestORM, I see the metadata, but where is the data array stored?

I pass this to some register calibration method which adds a row to my calibration db and creates a FITS file of my flat inside the cache so in the future it is retrieved from there

This should be store.register_local_calibration, but now you want to remove auto-saving the file? Am I supposed to save the file somewhere myself? and then you trust me to give you the right path?

Here are the two possible patterns/interfaces as I see it:

The DRP is responsible for saving the calibration FITS file (data + metadata) to the cache by calling model.save(). The DRP is also responsible for constructing the ORM object by calling model.to_orm(), and passing only that ORM object to store.register_local_calibration() which will add the metadata (ORM object) to the SQLite DB.

The DRP passes the data model directly to store.register_local_calibration(). The store calls model.save() to save the file to the cache, and then the store calls model.to_orm() and adds that ORM object to the local SQLite DB.

I agree option 2 is the simpler pattern, and if we can find a solution closer to that, I'm for it. The reason I like option 1 is because of dependencies/expectations of the middleware.

In option 1, the middleware doesn't care or need to know what a datamodel is.

In option 2, the middleware at minimum needs to know that a data model has model.save() and model.to_orm(). I could also be convinced the best solution is simply to enforce that as an interface requirement.

An alternative approach is to consider how an instrument-specific middleware (e.g. liger_middleware) could solve this (I know this was the original plan). For example, a less stringent requirement would be that liger_middleware understands Liger data models.

I don't like pattern 1, the cache is part of the middleware, the user should not mess with that.

I would go with 2 and an interface requirement, a save and a to_orm methods seem easy to understand.

Ok, then I'm fine with closing the PR and keeping main as is.

astrobc1 requested a review from zonca September 2, 2025 16:52

astrobc1 and others added 2 commits September 3, 2025 10:46

Middleware no longer saves datamodels to cache; left to pipeline.

9595717

feat: Add in-memory CalibrationStore fixture and tests for local cali…

16fb1ab

…bration registration

astrobc1 force-pushed the refactor/save-to-cache branch from 674433e to 16fb1ab Compare September 3, 2025 17:50

Fixed quickstart to reflect updated interface

0931f21

zonca reviewed Sep 4, 2025

View reviewed changes

astrobc1 closed this Sep 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Middleware no longer saves datamodels to cache; left to pipeline. #4

Middleware no longer saves datamodels to cache; left to pipeline. #4

Uh oh!

astrobc1 commented Sep 2, 2025

Uh oh!

zonca commented Sep 2, 2025

Uh oh!

astrobc1 commented Sep 2, 2025 •

edited

Loading

Uh oh!

zonca commented Sep 2, 2025

Uh oh!

astrobc1 commented Sep 2, 2025

Uh oh!

zonca Sep 4, 2025

Uh oh!

astrobc1 Sep 4, 2025

Uh oh!

zonca Sep 5, 2025

Uh oh!

astrobc1 Sep 5, 2025

Uh oh!

astrobc1 Sep 5, 2025

Uh oh!

zonca Sep 5, 2025 •

edited

Loading

Uh oh!

astrobc1 Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Middleware no longer saves datamodels to cache; left to pipeline. #4

Middleware no longer saves datamodels to cache; left to pipeline. #4

Uh oh!

Conversation

astrobc1 commented Sep 2, 2025

Uh oh!

zonca commented Sep 2, 2025

Uh oh!

astrobc1 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zonca commented Sep 2, 2025

Uh oh!

astrobc1 commented Sep 2, 2025

Uh oh!

zonca Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

astrobc1 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

zonca Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

astrobc1 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

astrobc1 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

zonca Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astrobc1 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

astrobc1 commented Sep 2, 2025 •

edited

Loading

zonca Sep 5, 2025 •

edited

Loading