Skip to content

Conversation

@astrobc1
Copy link
Contributor

@astrobc1 astrobc1 commented Sep 2, 2025

@zonca This is only a minor change. Currently the CalibrationStore is responsible for calling model.save(), but I would rather have the DRP be responsible for this. This way, the DRP only passes python dict's and ORM objects, both of which it understands. In the HISPEC DRP, this would mean adding something like model.save() to RegisterCalibrationPrimitive between lines 25-26.

This would technically break HISPEC_DRP until I make the corresponding change there. How do you recommend I handle cases like this?

@astrobc1 astrobc1 requested a review from zonca September 2, 2025 16:52
@zonca
Copy link
Member

zonca commented Sep 2, 2025

is this assuming that the calibration is already in the cache folder? I think the expected interface is I give the calibration store either a fits file or a data model and the store handles copying or saving the file into the right folder automatically. Also, if given a fits file the store should read it, add proper metadata before saving it into the cache.

@astrobc1
Copy link
Contributor Author

astrobc1 commented Sep 2, 2025

I think the appropriate order is:

  1. Save .fits file to local cache - In most cases, the file might not live anywhere on disk yet (maybe it was just created). Therefore copying a file won't work for most models unless it's already saved elsewhere.
  2. Create ORM object and add to local DB.

I'm not totally opposed to the Store calling model.save(). However since the middleware has no dependence on the DRP and provides no specifications on what a datamodel looks like, I prefer the DRP handles this.

If given only a filename, I would require the FITS file already has all of the necessary metadata populated (at least for HISPEC_DRP) and is correctly formatted otherwise, in which case a file copy is fine.

@zonca
Copy link
Member

zonca commented Sep 2, 2025

@astrobc1 I added a simple test, please check if it makes sense.

what I am worried about is people sharing their local db with paths that only work on their machines.
I would envision instead a system that has all calibration files as relative paths to the root folder of the calibration store cache

@astrobc1
Copy link
Contributor Author

astrobc1 commented Sep 2, 2025

Let me know if I'm misunderstanding - The DB itself shouldn't contain local paths (only remote/archival paths). Sharing an entire local cache folder will work since the full path to a calibration file is specified by the environment variable KOA_CALIBRATION_CACHE. Sharing just the sqlite .db file should also work if the actual files also live on the other machine.

There is still a relative path within the cache that needs formed (see here), so there is an assumption regarding how the cache folder is structured (see here).

@astrobc1 astrobc1 force-pushed the refactor/save-to-cache branch from 674433e to 16fb1ab Compare September 3, 2025 17:50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrobc1 do you think this test shows a reasonable way of using the calibration store? can you make it more realistic? I think it would help me understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think test_store.py and test_koa_middleware_integration.py are both great tests and test similar functionality as HISPEC's corresponding test.

Looking closer at how test_store.py calls query_by_id, should it instead return first() instead of all()? I think it's safe to assume "id" is always the primary key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's look at test_store.py.

So the user has created their own flat named mock_cal, they want to use it for many pipelines in the future, so they want to save it into the calibration db so it is automatically retrieved given the proper query parameters.

This is what I would expect:

  1. I create an object that has both the data array of my flat and metadata

    I guess this should be CalibrationTestORM, I see the metadata, but where is the data array stored?

  2. I pass this to some register calibration method which adds a row to my calibration db and creates a FITS file of my flat inside the cache so in the future it is retrieved from there

    This should be store.register_local_calibration, but now you want to remove auto-saving the file? Am I supposed to save the file somewhere myself? and then you trust me to give you the right path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the two possible patterns/interfaces as I see it:

  1. The DRP is responsible for saving the calibration FITS file (data + metadata) to the cache by calling model.save(). The DRP is also responsible for constructing the ORM object by calling model.to_orm(), and passing only that ORM object to store.register_local_calibration() which will add the metadata (ORM object) to the SQLite DB.

  2. The DRP passes the data model directly to store.register_local_calibration(). The store calls model.save() to save the file to the cache, and then the store calls model.to_orm() and adds that ORM object to the local SQLite DB.

I agree option 2 is the simpler pattern, and if we can find a solution closer to that, I'm for it. The reason I like option 1 is because of dependencies/expectations of the middleware.

In option 1, the middleware doesn't care or need to know what a datamodel is.

In option 2, the middleware at minimum needs to know that a data model has model.save() and model.to_orm(). I could also be convinced the best solution is simply to enforce that as an interface requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach is to consider how an instrument-specific middleware (e.g. liger_middleware) could solve this (I know this was the original plan). For example, a less stringent requirement would be that liger_middleware understands Liger data models.

Copy link
Member

@zonca zonca Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like pattern 1, the cache is part of the middleware, the user should not mess with that.

I would go with 2 and an interface requirement, a save and a to_orm methods seem easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'm fine with closing the PR and keeping main as is.

@astrobc1 astrobc1 closed this Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants