From b7194c8a390745890ba3f0dc3a9665d791ccbf5b Mon Sep 17 00:00:00 2001 From: rod-glover Date: Tue, 12 Aug 2025 17:47:46 -0700 Subject: [PATCH 01/13] Initial draft --- docs/bookmarks.md | 285 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 285 insertions(+) create mode 100644 docs/bookmarks.md diff --git a/docs/bookmarks.md b/docs/bookmarks.md new file mode 100644 index 00000000..47b242a7 --- /dev/null +++ b/docs/bookmarks.md @@ -0,0 +1,285 @@ +# Bookmarks aka versions aka tags aka commits aka ... + +OK, let's use the term "bookmark" for now. Single word, not otherwise used, seemingly intuitive meaning, easy to search/replace. + +We want to be able to "bookmark" a state of the database (i.e., bookmark a point in its history) with an identifier that can be used to reconstruct that state after subsequent changes to the database. The nominal use case is "bookmark the present state," but there are definitely other possibilities. + +## Terminology + +**Bookmark**: A named object that points at a set of history records, one per history-tracked table. This definition conflates the notion of a bookmark proper and the association of it to history records, which actually are distinct, but it will suffice for now. + +## Facts and assumptions + +**Facts** + +- History tables are append-only. +- Each history table records the changes made to the entire collection *in temporal order of the changes*. +- Each successive update to the collection is recorded by adding a record at the end, which so that temporal order is also the order by ascending unique history id. + +**Assumptions** + +- No existing record in the history table is ever modified. + +**Therefore** + +- If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. +- Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). + + +**FIXME**: Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. +- However, that's not true if we allow bookmarking of non-latest states, which is probably going to be desirable. We can't think ahead perfectly. Hmmm. +- Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These had better be consistent across all history tables. + +## Basic bookmark operations + +We think at present that the operation of "bookmarking" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history record, but rather at a contemporaneous set of history records that represent a real point in time in the history of the database. + +### Bookmark the database state + +Motivation: This is the fundamental operation in bookmarking. + +Operation: Associate a bookmark to a set of history records, one per history-tracked table, that represent a real point in time of the history of the database. + +- When a bookmark is created at the actual point in time when the bookmarked database state exists, the bookmark is guaranteed to be valid. +- It is possible that we will want to bookmark a *past* state of the database after the fact. In that case, we will need a validity check to ensure that the set history records associated to the bookmark actually represent a true past state, and not just an arbitrary and inconsistent selection of history records. + +### Bracket (or group) a set of updates + +Motivation/scenario: A set of related updates are received or made all at one time. The canonical case is a QA update of a large set of observations. (In other cases, e.g., when a scientist is updating things, it will take a certain amount of discipline to make sure that the updates are batched together like this.) + +Operation: *Within a transaction (i.e., atomically)*: + +- Bookmark the database with bookmark $B_1$. +- Perform updates. +- Bookmark the database with bookmark $B_2$. + +Changes between $B_1$ (exclusive) and $B_2$ (inclusive) are exactly and only those changes made in the updates. This is due to: + +- their isolation in the update transaction (so no other operations interleaved); +- the fact that change records are appended to the history tables in temporal order of change operations; therefore the last change operation is recorded at the end of the relevant history table. + +We call this operation bracketing or grouping, and denote it $Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the bookmarks and $U$ is the set of updates. + +For further discussion and analysis: + +- Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark prefix $B$, and allow the system to construct bookmarks $B_1$ and $B_2$ from $B$. We can then define $Bracket(B, U) = Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the constructed bookmarks. +- **BAD IDEA?** A bracket bookmark association could additionally include a time-of-bookmarking component, so that the same bookmark prefix can be used to construct bracket bookmarks for multiple applications of it. This time-of-bookmarking component can, in fact, be placed in the association, not the bookmark itself. I don't even know what this series of words means any more. + - Maybe better to let the bookmark name include any additional data needed than to try too hard to anticipate usage. + - Not only that but there are doubtless usages in which time of bookmarking is not relevant. + - However, it is conceivable that, given that time of bookmarking is only a single additional column and might generically be useful -- as part of *bookmarking history* -- that we should include it anyway. + - **TODO**: think through the concept and possible implementations of bookmarking history. + + +## Applications + +### Database state reconstruction or Rollback + +#### Outline + +Due to the design of history tracking, it is easy in concept to reconstruct the complete state of the database from a bookmark. This is in fact what bookmarking means. + +Q: How do we do that? +A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark points at. One solution is the following: For each collection (i.e., for each history-tracked table) +``` +SELECT DISTINCT ON (collection_item_id) * +FROM collection_hx +WHERE collection_hx_id <= collection_hx_id_from_bookmark +ORDER BY collection_hx_id DESC +``` +returns the set of collection items that was current as of the bookmark. +#### Implementation considerations + +For metadata tables, which have relatively few records, the above query is likely not too time-consuming. For `obs_raw_hx`, this will scan a huge number of records. To make it perform better, further WHERE conditions may have to be added and possibly judicious indexing on the history tables. But see also below. + +Alternatively: Further to the problem of `obs_raw` being enormous, `obs_raw_hx` being even enormouser, and queries against it therefore taking very long times, here is a possibility, in which we create a separate rolled-back version of the database. + +In a separate schema, call it `crmp_rollback`, do the following. Given a bookmark: + +- Establish replica of the `crmp` schema (i.e., table definitions without data) including main tables but excluding history tables (history tables are redundant here). +- Include FK relationships, indexes, and other things as needed. +- Duplicate the content of the non-history tracked tables (at the time of of this rollback). +- Populate the history-tracked tables using queries as above. +- Define and populate (one row) a rollback table that contains at least the following information: + - bookmark id + - timestamp when this rollback was established + - id of user creating the rollback + - any other important information not retrievable with this data +- Possibly make all tables read-only. +- Possibly make this schema accessible only by a specific role which is granted only to the user(s) who need this version of the database. + +And, in the main `crmp` schema, define a stored procedure that does all of the above: + +- Given a bookmark id and rollback schema name +- In a valid order (respecting foreign key dependencies) +- With optimized queries insofar as possible +### Efficient grouping of data versions + +CRMP partners periodically release new versions of a dataset, where each item in the dataset is defined by variable, station history, and observation time. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. + +Releases typically are, in time order: +1. A raw dataset. +2. A QA-adjusted dataset. + +Each observation in the second release is a revision to an observation in the first release. +#### An unsatisfactory approach + +It is possible to group the observations in such a release individually, by associating a bookmark to each updated item's history record. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. This is not desirable and should be avoided if at all possible. +#### Bracketing + +An alternative (if operationally possible, meaning depending on how the new release becomes available): +- Let $U$ be the set of updates that add the new release. +- Define bookmarks $B_1$ and $B_2$ to demarcate (bracket) the release. +- Perform $Bracket(B_1, U, B_2)$. + +Bracketing a release in this way requires only constant time and space, and makes retrieval of the dataset faster. +##### of large datasets (QA releases, other updates) + +If it not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations. +##### for regular ingestion (`crmprtd`) + +Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, hundreds to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can therefore use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, but not in fact one observation at a time. + +Considerations: +- It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with a minimum of fuss. Here is where the idea of a bookmark association including a temporal column (or a more general informational text column) would be useful. Details (reference also Implementation notes below): + - +## Implementation notes + +We begin to see the outlines of a plausible implementation, as follows. +### Tables + +**Table `bookmarks`** + +| Column | Type | Remarks | +| -------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_id` | `int` | PK | +| `name` | `text` | | +| ? `comment` | `text` | Meaning? Utility? | +| ? `network_id` | `int` | FK `meta_network`. Utility is to distinguish bookmarks for one network from another, and allow a simple, natural name in common, such as 'QA'. Normalizes a common use case, I think. Tempting to make nullable, but caution, nullable columns have frequently been abused in CRMP. | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | +Constraints + +- unique (`name`, `network_id`) + +Questions: + +1. Apply history tracking to this table? Reason, utility? + +**Table `bookmark_associations`** + +Q: Why separate association from bookmark proper? + +A: +- Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. +- We likely want to bracket datasets ingested by `crmprtd` using the same bookmark. +- Nevertheless the normalization here seems a bit awkward, or rather to permit more than valid usage. Some further considering required. + +| Column | Type | Remarks | +| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | +| ? `aux` | ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark. Specific usages: `crmprtd` ingestion; possibly regular QA releases. Will require discipline on the part of the user in order not to make a mess. Need better name; will depend in part on type (`timestamp` vs. more general `text`). | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | +Constraints: + +- unique (`bookmark_id`, `type`, `aux`) + +Questions: + +1. Apply history tracking to this table? Reason, utility? + +### Functions, stored procedures + +Since bookmarking is a non-trivial activity, it will be useful to encode it in stored procedures. There is some question of whether some or all of this should instead be utility code in the PyCDS repo proper and not SP's within the database, but we'll mix em all up here as SP's. + +1. Create a bookmark association at current time (current state of database). +2. Perform bracketing operation. + 1. Create bookmark(s). + 2. Create bracket-begin association. + 3. Apply updates. + 4. Create bracket-end association. +3. Check tuple validity. +4. Create a bookmark association from a past state (history tuple). Check validity of tuple. +5. Determine support (see discussion) of an observation. Result is a valid history tuple. Can then create bookmark association to it. + +### Triggers + +1. Enforce values of `mod_time`, `mod_user`. (As for history tracking.) +## History tuples, database subset, and validity + +Our goal here is to check whether a given set of history id's is valid, i.e., does it represent a real, consistent historical state of the database. This will be useful when attempting to define a bookmark association post hoc. + +**History tuples**: We can regard a bookmark association as a tuple of collection history id's, one per collection (hereafter a history id tuple, history tuple, or just tuple). + +**Database history subset**: A history tuple defines a subset of the database history, namely all those history records in each history table that occur before the corresponding history id in the tuple. Under the reasonable assumption (see above) that the temporal order of history records is the same as the history id order, "before" here means that history id is less than or equal to the history id in the tuple. + +**Validity**: Not all such database subsets, and therefore such tuples, are valid. + +The criterion for validity is essentially referential integrity. That is, within the subset of history records implied by the tuple, all references by a history record in that subset to another history record must also be found in the subset. Otherwise (i.e., when there is a violation of referential integrity within the subset) the database history subset is not valid. + +History id tuples are valid iff they imply a valid database history subset. + +**Algorithms for checking tuple validity**: Given this definition, algorithms for checking tuple validity are straightforward: + +1. The naive algorithm checks every reference in any given subset for presence in the referenced collection history subset. But this is a huge number of records in most cases. +2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. + +**Applications**: + +1. Creating a (valid) bookmark association post hoc. +2. It is possible that a history tuple may be presented for checkout (see below) that is not known to be valid. +3. It is also possible that a single point of a single collection may be presented and we wish to construct a valid database historical state from it. Validity criteria allow us to do this. + +## Metadata support set + +This section may not be all that useful any more ... but I include it for consideration. + +The idea of the "metadata support" may prove useful in talking clearly about bookmarking. In particular, it may prove useful in discussing bookmarking or bracketing a set of observations post hoc. From here on, we may abbreviate "metadata support" to "support". + +Support enables us to talk in a well-defined, compact way about the metadata relevant to an observation (or set of observations), when the observations are the only handle you have at the outset. More accurately, we should say observation histories, since observations are mutable and not the target of bookmarking. + +The support of an observation history record $X$ is the set of metadata (history) records directly relevant to $X$, which is to say directly associated to $X$ by one or more FK links away from the observation. This in fact applies to any history record $X$, but observation histories are the most important and are the most general or complex case. + +### Definitions + +We define 2 particular cases of support that are especially relevant: + +#### Historical support, $Sh(X)$ + +- The *historical (metadata) support* of observation history record $X$, denoted $Sh(X)$, is the tuple of metadata history records linked to it via history table foreign keys followed directly from one history record to another. +- There is always exactly one of each metadata history record type (Network, Station, Station History, Variable) in this tuple. +- This tuple is the precise metadata state at the time of creation of $X$. +- This tuple *does not change* when updates to the corresponding metadata items are made. + +We can easily generalize this to a set $S$ of history records: + +- $Sh(S) = \bigcup_{X \in S} Sh(X)$ + +#### Current (or latest) support, $Sc(X)$ + +- The *latest (metadata) support* of observation history record $X$, denoted $Sc(X)$, is the set of metadata history records defined as: For each record in $Sh(X)$, use the current latest record for that metadata item; equivalently, use the metadata *item* foreign key to retrieve that record from the primary table. +- There is always exactly one of each metadata history record type (Network, Station, Station History, Variable) in this set. +- This set provides the *current state* of metadata relevant to $X$, with all updates to those items. This set *changes* when the corresponding metadata items are updated, and is not fixed over time. + +We can easily generalize this to a set $S$ of history records: + +- $Sc(S) = \bigcup_{X \in S} Sc(X)$ + +#### Support at tag, `St(X,T)` + +Is this still relevant? It seems that if we define bookmarking as an association to a tuple of history records, then this whole thing is redundant. + +A slightly less self-evident case of support + +- The support set of of observation history record $X$ at tag `T`, denoted `St(X,T)`, is the set of metadata history records defined by: For each record in $Sh(X)$, use the metadata history record tagged by `T` for that metadata item. +- There may be no such metadata history record for some or all of the elements of $Sc(X)$. Therefore `St(X,T)` may not contain one item for every metadata record type. +- Tag `T` can tag *any* metadata history record in an item's history set. Therefore the elements of `St(X,T)` may occur *before* the historical support items for $X$. This may or may not make sense in any given context. + +It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. \ No newline at end of file From dd0fa9ef9cc9ce1a21e5618a99c0654e374cade1 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Wed, 13 Aug 2025 11:23:50 -0700 Subject: [PATCH 02/13] Update --- docs/bookmarks.md | 107 ++++++++++++++++++++++++++-------------------- 1 file changed, 61 insertions(+), 46 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index 47b242a7..5b6b7c2c 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -1,6 +1,6 @@ -# Bookmarks aka versions aka tags aka commits aka ... +# Bookmarks aka version tags aka commits aka ... -OK, let's use the term "bookmark" for now. Single word, not otherwise used, seemingly intuitive meaning, easy to search/replace. +Let's use the term "bookmark" for now. Single word, not otherwise used, seemingly intuitive meaning, easy to search/replace. We want to be able to "bookmark" a state of the database (i.e., bookmark a point in its history) with an identifier that can be used to reconstruct that state after subsequent changes to the database. The nominal use case is "bookmark the present state," but there are definitely other possibilities. @@ -14,7 +14,7 @@ We want to be able to "bookmark" a state of the database (i.e., bookmark a point - History tables are append-only. - Each history table records the changes made to the entire collection *in temporal order of the changes*. -- Each successive update to the collection is recorded by adding a record at the end, which so that temporal order is also the order by ascending unique history id. +- Each successive update to the collection is recorded by adding a record at the end, which so that temporal order is also the order by 0ascending unique history id. **Assumptions** @@ -25,29 +25,30 @@ We want to be able to "bookmark" a state of the database (i.e., bookmark a point - If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. - Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). - **FIXME**: Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. - However, that's not true if we allow bookmarking of non-latest states, which is probably going to be desirable. We can't think ahead perfectly. Hmmm. - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These had better be consistent across all history tables. ## Basic bookmark operations -We think at present that the operation of "bookmarking" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history record, but rather at a contemporaneous set of history records that represent a real point in time in the history of the database. +We believe firmly at present that the operation of "bookmark this state" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history table record, but rather to a contemporaneous set of records, one per history table, that represent a real point in time in the history of the database, and that this is an atomic operation. ### Bookmark the database state -Motivation: This is the fundamental operation in bookmarking. +**Motivation**: This is the fundamental operation in bookmarking. + +**Operation**: Associate a bookmark to a set of history records, one per history-tracked table, that represent a real point in time of the history of the database. This operation is atomic. -Operation: Associate a bookmark to a set of history records, one per history-tracked table, that represent a real point in time of the history of the database. +Notes: - When a bookmark is created at the actual point in time when the bookmarked database state exists, the bookmark is guaranteed to be valid. - It is possible that we will want to bookmark a *past* state of the database after the fact. In that case, we will need a validity check to ensure that the set history records associated to the bookmark actually represent a true past state, and not just an arbitrary and inconsistent selection of history records. ### Bracket (or group) a set of updates -Motivation/scenario: A set of related updates are received or made all at one time. The canonical case is a QA update of a large set of observations. (In other cases, e.g., when a scientist is updating things, it will take a certain amount of discipline to make sure that the updates are batched together like this.) +**Motivation/scenario**: A set of related updates are received or made all at one time. The canonical case is a QA update of a large set of observations. (In other cases, e.g., when a scientist is updating things, it will take a certain amount of discipline to make sure that the updates are batched together like this.) -Operation: *Within a transaction (i.e., atomically)*: +**Operation**: *Within a transaction (i.e., atomically)*: - Bookmark the database with bookmark $B_1$. - Perform updates. @@ -60,7 +61,7 @@ Changes between $B_1$ (exclusive) and $B_2$ (inclusive) are exactly and only tho We call this operation bracketing or grouping, and denote it $Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the bookmarks and $U$ is the set of updates. -For further discussion and analysis: +**For further discussion and analysis**: - Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark prefix $B$, and allow the system to construct bookmarks $B_1$ and $B_2$ from $B$. We can then define $Bracket(B, U) = Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the constructed bookmarks. - **BAD IDEA?** A bracket bookmark association could additionally include a time-of-bookmarking component, so that the same bookmark prefix can be used to construct bracket bookmarks for multiple applications of it. This time-of-bookmarking component can, in fact, be placed in the association, not the bookmark itself. I don't even know what this series of words means any more. @@ -76,10 +77,10 @@ For further discussion and analysis: #### Outline -Due to the design of history tracking, it is easy in concept to reconstruct the complete state of the database from a bookmark. This is in fact what bookmarking means. +The design of history tracking makes easy in concept to reconstruct the complete state of the database from a bookmark. This is in fact what bookmarking is for. Q: How do we do that? -A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark points at. One solution is the following: For each collection (i.e., for each history-tracked table) +A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark points at. One such query is the following: For each collection (i.e., each history-tracked table) ``` SELECT DISTINCT ON (collection_item_id) * FROM collection_hx @@ -89,9 +90,9 @@ ORDER BY collection_hx_id DESC returns the set of collection items that was current as of the bookmark. #### Implementation considerations -For metadata tables, which have relatively few records, the above query is likely not too time-consuming. For `obs_raw_hx`, this will scan a huge number of records. To make it perform better, further WHERE conditions may have to be added and possibly judicious indexing on the history tables. But see also below. +For metadata tables, which have few records, the above query is likely fast. For `obs_raw_hx`, it will scan a huge number of records. To make it perform better, further WHERE conditions may have to be added and possibly judicious indexing on the history tables. But see also below. -Alternatively: Further to the problem of `obs_raw` being enormous, `obs_raw_hx` being even enormouser, and queries against it therefore taking very long times, here is a possibility, in which we create a separate rolled-back version of the database. +Alternatively: Further to the problem of `obs_raw_hx` being enormous, and queries against it therefore taking very long times, here is a possibility, in which we create a separate rolled-back version of the database. In a separate schema, call it `crmp_rollback`, do the following. Given a bookmark: @@ -112,13 +113,18 @@ And, in the main `crmp` schema, define a stored procedure that does all of the a - Given a bookmark id and rollback schema name - In a valid order (respecting foreign key dependencies) - With optimized queries insofar as possible + +Once the rollback schema is populated with data reflecting a given point in history, the users with interest in it can query it as if it is the actual CRMP database. It will not in general be wise to allow the users to modify this database, since those modifications will not under any circumstances be propagated to the real database. For experimental purposes, with appropriate, loudly stated caveats, this rule may be relaxed. + +Any number of rollback schemas can be established. They do not interact with each other, nor with the live CRMP database. But because each rollback schema will be comparable in size to the live database, we may wish to limit their number and their lifetimes. ### Efficient grouping of data versions -CRMP partners periodically release new versions of a dataset, where each item in the dataset is defined by variable, station history, and observation time. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. +CRMP partners periodically release new versions of a dataset. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. Releases typically are, in time order: -1. A raw dataset. -2. A QA-adjusted dataset. + +1. A raw dataset. This dataset frequently arrives incrementally, via `crmprtd`. +2. A QA-adjusted dataset. This dataset is expected to arrive in one or a few large batches. Each observation in the second release is a revision to an observation in the first release. #### An unsatisfactory approach @@ -126,22 +132,28 @@ Each observation in the second release is a revision to an observation in the fi It is possible to group the observations in such a release individually, by associating a bookmark to each updated item's history record. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. This is not desirable and should be avoided if at all possible. #### Bracketing -An alternative (if operationally possible, meaning depending on how the new release becomes available): +Bracketing uses only two bookmarks to demarcate the beginning and end of a batch of updates: + - Let $U$ be the set of updates that add the new release. - Define bookmarks $B_1$ and $B_2$ to demarcate (bracket) the release. - Perform $Bracket(B_1, U, B_2)$. -Bracketing a release in this way requires only constant time and space, and makes retrieval of the dataset faster. -##### of large datasets (QA releases, other updates) +Bracketing requires only constant time and space relative to the number of updates within it. It significantly improves both memory footprint and retrieval time. +##### Bracketing for large datasets (QA releases, other updates) -If it not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations. -##### for regular ingestion (`crmprtd`) +QA releases and other updates are expected to arrive in large batches. If it is not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations. +##### Bracketing for regular ingestion (`crmprtd`) -Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, hundreds to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can therefore use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, but not in fact one observation at a time. +Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and is linear, not constant space and time in the number of observations, but it is better than bookmarking one observation at a time. Considerations: -- It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with a minimum of fuss. Here is where the idea of a bookmark association including a temporal column (or a more general informational text column) would be useful. Details (reference also Implementation notes below): - - + +- It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. Here is where the idea of a bookmark association including a temporal column (or a more general informational text column) would be useful. +- In some sense a fully normalized representation for this case includes: + - A single bookmark associated multiple times to a series of groups of observations. + - One part (column) of the association distinguishes bracket-start and bracket-end. + - Another part (column) of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for possible more general uses of "many groups labelled by a single bookmark". +- For more details, see Implementation notes below. ## Implementation notes We begin to see the outlines of a plausible implementation, as follows. @@ -157,6 +169,7 @@ We begin to see the outlines of a plausible implementation, as follows. | ? `network_id` | `int` | FK `meta_network`. Utility is to distinguish bookmarks for one network from another, and allow a simple, natural name in common, such as 'QA'. Normalizes a common use case, I think. Tempting to make nullable, but caution, nullable columns have frequently been abused in CRMP. | | `mod_user` | `text` | | | `mod_time` | `timestamp` | | + Constraints - unique (`name`, `network_id`) @@ -174,22 +187,24 @@ A: - We likely want to bracket datasets ingested by `crmprtd` using the same bookmark. - Nevertheless the normalization here seems a bit awkward, or rather to permit more than valid usage. Some further considering required. -| Column | Type | Remarks | -| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | -| ? `aux` | ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark. Specific usages: `crmprtd` ingestion; possibly regular QA releases. Will require discipline on the part of the user in order not to make a mess. Need better name; will depend in part on type (`timestamp` vs. more general `text`). | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +| Column | Type | Remarks | +| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | +| ? `group` | ?`timestamp`; ?`int`; ?`text` | For distinguishing multiple associations of the same bookmark. Specific usages: `crmprtd` ingestion; possibly for regular QA releases. Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | +| ? `aux_info` | `text` | Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that it is an integer. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | + Constraints: -- unique (`bookmark_id`, `type`, `aux`) +- unique (`bookmark_id`, `type`, `group`) Questions: @@ -197,21 +212,21 @@ Questions: ### Functions, stored procedures -Since bookmarking is a non-trivial activity, it will be useful to encode it in stored procedures. There is some question of whether some or all of this should instead be utility code in the PyCDS repo proper and not SP's within the database, but we'll mix em all up here as SP's. +Since bookmarking is a non-trivial activity, it will be useful to encode it in stored procedures. There is some question of whether some or all of this should instead be utility Python code in the PyCDS repo proper and not SP's within the database, but we'll mix em all up here in one list. 1. Create a bookmark association at current time (current state of database). -2. Perform bracketing operation. +2. Check tuple validity. +3. Create a bookmark association from a past state (history tuple). Check validity of tuple. +4. Determine support (see discussion) of an observation. Result is a valid history tuple. Can then create bookmark association to it. +5. Perform bracketing operation. 1. Create bookmark(s). 2. Create bracket-begin association. 3. Apply updates. 4. Create bracket-end association. -3. Check tuple validity. -4. Create a bookmark association from a past state (history tuple). Check validity of tuple. -5. Determine support (see discussion) of an observation. Result is a valid history tuple. Can then create bookmark association to it. ### Triggers -1. Enforce values of `mod_time`, `mod_user`. (As for history tracking.) +1. Enforce values of `mod_time`, `mod_user` in `bookmarks` and `bookmark_associations`. (As for history tracking; reuse tf.) ## History tuples, database subset, and validity Our goal here is to check whether a given set of history id's is valid, i.e., does it represent a real, consistent historical state of the database. This will be useful when attempting to define a bookmark association post hoc. @@ -239,7 +254,7 @@ History id tuples are valid iff they imply a valid database history subset. ## Metadata support set -This section may not be all that useful any more ... but I include it for consideration. +**Note/TODO**: This section may not be all that useful any more ... but I include it for consideration. It may also be overcomplicated ... the support of a set of observations may be more general than is really useful. It might be better to consider the support of only the earliest and latest records in the set, since those effectively bracket the group of observations. But, post-hoc, i.e., non-atomically, that bracketing is too large, so we will need to look at some notion of contiguous groups if that is possible. Oy vey, more work to do here. The idea of the "metadata support" may prove useful in talking clearly about bookmarking. In particular, it may prove useful in discussing bookmarking or bracketing a set of observations post hoc. From here on, we may abbreviate "metadata support" to "support". From b33a5c8e2febf4cfabdaa23ce8acba0390b03ac4 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Thu, 14 Aug 2025 17:04:04 -0700 Subject: [PATCH 03/13] Update --- docs/bookmarks.md | 158 ++++++++++++++++++++++++++++------------------ 1 file changed, 96 insertions(+), 62 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index 5b6b7c2c..27a1b8e7 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -1,12 +1,36 @@ # Bookmarks aka version tags aka commits aka ... -Let's use the term "bookmark" for now. Single word, not otherwise used, seemingly intuitive meaning, easy to search/replace. - -We want to be able to "bookmark" a state of the database (i.e., bookmark a point in its history) with an identifier that can be used to reconstruct that state after subsequent changes to the database. The nominal use case is "bookmark the present state," but there are definitely other possibilities. - +# Table of contents + + - [Terminology](#terminology) + - [Facts and assumptions](#facts-and-assumptions) + - [Basic bookmark operations](#basic-bookmark-operations) + - [Create a bookmark](#create-a-bookmark) + - [Bookmark the database state (create a bookmark association)](#bookmark-the-database-state-create-a-bookmark-association) + - [Bracket (or group) a set of updates](#bracket-or-group-a-set-of-updates) + - [Applications](#applications) + - [Database state reconstruction or Rollback](#database-state-reconstruction-or-rollback) + - [Outline](#outline) + - [Implementation considerations](#implementation-considerations) + - [Efficient grouping of data versions](#efficient-grouping-of-data-versions) + - [An unsatisfactory approach](#an-unsatisfactory-approach) + - [Bracketing](#bracketing) + - [Bracketing for large datasets (QA releases, other updates)](#bracketing-for-large-datasets-qa-releases-other-updates) + - [Bracketing for regular ingestion (`crmprtd`)](#bracketing-for-regular-ingestion-crmprtd) + - [Considerations](#considerations) + - [Implementation notes](#implementation-notes) + - [Tables](#tables) + - [Functions, stored procedures](#functions-stored-procedures) + - [Triggers](#triggers) + - [History tuples, database subset, and validity](#history-tuples-database-subset-and-validity) + - [Metadata support set](#metadata-support-set) + - [Definitions](#definitions) + - [Historical support, $Sh(X)$](#historical-support-shx) + - [Current (or latest) support, $Sc(X)$](#current-or-latest-support-scx) + - [Support at tag, `St(X,T)`](#support-at-tag-stxt) ## Terminology -**Bookmark**: A named object that points at a set of history records, one per history-tracked table. This definition conflates the notion of a bookmark proper and the association of it to history records, which actually are distinct, but it will suffice for now. +- **Bookmark**: A named object that points at a set of history records, one per history-tracked table. This definition conflates the notion of a bookmark proper and the association of it to history records, which actually are distinct, but it will suffice for now. ## Facts and assumptions @@ -14,7 +38,7 @@ We want to be able to "bookmark" a state of the database (i.e., bookmark a point - History tables are append-only. - Each history table records the changes made to the entire collection *in temporal order of the changes*. -- Each successive update to the collection is recorded by adding a record at the end, which so that temporal order is also the order by 0ascending unique history id. +- Each successive update to the collection is recorded by appending a record to the history table; therefore temporal order is also the order by ascending history id. **Assumptions** @@ -25,51 +49,54 @@ We want to be able to "bookmark" a state of the database (i.e., bookmark a point - If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. - Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). -**FIXME**: Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. -- However, that's not true if we allow bookmarking of non-latest states, which is probably going to be desirable. We can't think ahead perfectly. Hmmm. -- Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These had better be consistent across all history tables. +**For further consideration** + +- Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. + - However, that's not true if we allow bookmarking of non-latest states, which is probably going to be desirable. We can't think ahead perfectly. Hmmm. + - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These should be consistent across all history tables; verify this thinking. ## Basic bookmark operations -We believe firmly at present that the operation of "bookmark this state" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history table record, but rather to a contemporaneous set of records, one per history table, that represent a real point in time in the history of the database, and that this is an atomic operation. +We believe that the operation of "bookmark the current state" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history table record, but rather to a contemporaneous set of records, one per history table, that represent a real point in time in the history of the database, and that this is an atomic operation. + +Post-hoc bookmarking raises slightly more difficult issues, but the principle remains the same: each bookmark should represent a real point in the history of the database. -### Bookmark the database state +### Create a bookmark -**Motivation**: This is the fundamental operation in bookmarking. +**Motivation**: A bookmark is a data object. It is independent of the database states it is associated to. We need to be able to create arbitrary bookmarks. -**Operation**: Associate a bookmark to a set of history records, one per history-tracked table, that represent a real point in time of the history of the database. This operation is atomic. +**Operation**: The act of creating a new bookmark $B$, eliding the details of bookmark implementation, is denoted $B = CreateBookmark(N, ...)$ where $N, ...$ is the bookmark name and other details. -Notes: +### Bookmark the database state (create a bookmark association) -- When a bookmark is created at the actual point in time when the bookmarked database state exists, the bookmark is guaranteed to be valid. -- It is possible that we will want to bookmark a *past* state of the database after the fact. In that case, we will need a validity check to ensure that the set history records associated to the bookmark actually represent a true past state, and not just an arbitrary and inconsistent selection of history records. +**Motivation**: This is the fundamental operation in bookmarking. "Database state" means "an actual historical state of the database as it performs updates to history-tracked tables". Such a state is represented by a tuple of history table id's, one per history table. + +- Such a tuple is automatically valid if it represents the current state of the database. +- We will very likely want to bookmark a *past* state of the database after the fact. In that case, we need to check that the tuple of history records associated to the bookmark actually represents a true past state, and not just an arbitrary and inconsistent selection of history records. For an answer to this, see [[#History tuples, database subset, and validity]]. + +**Operation**: Let $B$ be a bookmark. Let $S$ be a valid state of the database, represented as a tuple of history id's. Then, eliding the details of the association data object: +- $Bookmark(B, S)$ denotes the atomic operation of associating bookmark $B$ to state $S$. +- The shorthand $Bookmark(B)$ is defined as $Bookmark(B, S)$ where $S$ is the current state of the database. ### Bracket (or group) a set of updates **Motivation/scenario**: A set of related updates are received or made all at one time. The canonical case is a QA update of a large set of observations. (In other cases, e.g., when a scientist is updating things, it will take a certain amount of discipline to make sure that the updates are batched together like this.) -**Operation**: *Within a transaction (i.e., atomically)*: +**Operation**: Let $B_1$ and $B_2$ be two bookmarks. Let $U$ be a set of updates. Then the operation $Bracket(B_1, U, B_2)$ is defined as: -- Bookmark the database with bookmark $B_1$. -- Perform updates. -- Bookmark the database with bookmark $B_2$. +- *Within a transaction (i.e., atomically)*: + - Perform $Bookmark(B_1)$. + - Perform updates $U$. + - Perform $Bookmark(B_2)$. Changes between $B_1$ (exclusive) and $B_2$ (inclusive) are exactly and only those changes made in the updates. This is due to: - their isolation in the update transaction (so no other operations interleaved); - the fact that change records are appended to the history tables in temporal order of change operations; therefore the last change operation is recorded at the end of the relevant history table. -We call this operation bracketing or grouping, and denote it $Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the bookmarks and $U$ is the set of updates. - **For further discussion and analysis**: -- Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark prefix $B$, and allow the system to construct bookmarks $B_1$ and $B_2$ from $B$. We can then define $Bracket(B, U) = Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the constructed bookmarks. -- **BAD IDEA?** A bracket bookmark association could additionally include a time-of-bookmarking component, so that the same bookmark prefix can be used to construct bracket bookmarks for multiple applications of it. This time-of-bookmarking component can, in fact, be placed in the association, not the bookmark itself. I don't even know what this series of words means any more. - - Maybe better to let the bookmark name include any additional data needed than to try too hard to anticipate usage. - - Not only that but there are doubtless usages in which time of bookmarking is not relevant. - - However, it is conceivable that, given that time of bookmarking is only a single additional column and might generically be useful -- as part of *bookmarking history* -- that we should include it anyway. - - **TODO**: think through the concept and possible implementations of bookmarking history. - +- Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark prefix $B$, and allow the system to construct bookmarks $B_1$ and $B_2$ from $B$. We can then define $Bracket(B, U) = Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the constructed bookmarks. (Update: See use of auxiliary columns in bookmark association table in [[#Implementation notes]] below.) ## Applications @@ -77,10 +104,10 @@ We call this operation bracketing or grouping, and denote it $Bracket(B_1, U, B_ #### Outline -The design of history tracking makes easy in concept to reconstruct the complete state of the database from a bookmark. This is in fact what bookmarking is for. +The design of history tracking makes easy in concept to reconstruct the complete state of the database from a bookmark (or more precisely a bookmark association). This is in fact what bookmarking is for. Q: How do we do that? -A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark points at. One such query is the following: For each collection (i.e., each history-tracked table) +A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark association points at. One such query is the following: For each collection (i.e., each history-tracked table) ``` SELECT DISTINCT ON (collection_item_id) * FROM collection_hx @@ -88,6 +115,7 @@ WHERE collection_hx_id <= collection_hx_id_from_bookmark ORDER BY collection_hx_id DESC ``` returns the set of collection items that was current as of the bookmark. + #### Implementation considerations For metadata tables, which have few records, the above query is likely fast. For `obs_raw_hx`, it will scan a huge number of records. To make it perform better, further WHERE conditions may have to be added and possibly judicious indexing on the history tables. But see also below. @@ -101,7 +129,7 @@ In a separate schema, call it `crmp_rollback`, do the following. Given a bookmar - Duplicate the content of the non-history tracked tables (at the time of of this rollback). - Populate the history-tracked tables using queries as above. - Define and populate (one row) a rollback table that contains at least the following information: - - bookmark id + - bookmark association id - timestamp when this rollback was established - id of user creating the rollback - any other important information not retrievable with this data @@ -117,6 +145,7 @@ And, in the main `crmp` schema, define a stored procedure that does all of the a Once the rollback schema is populated with data reflecting a given point in history, the users with interest in it can query it as if it is the actual CRMP database. It will not in general be wise to allow the users to modify this database, since those modifications will not under any circumstances be propagated to the real database. For experimental purposes, with appropriate, loudly stated caveats, this rule may be relaxed. Any number of rollback schemas can be established. They do not interact with each other, nor with the live CRMP database. But because each rollback schema will be comparable in size to the live database, we may wish to limit their number and their lifetimes. + ### Efficient grouping of data versions CRMP partners periodically release new versions of a dataset. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. @@ -129,34 +158,39 @@ Releases typically are, in time order: Each observation in the second release is a revision to an observation in the first release. #### An unsatisfactory approach -It is possible to group the observations in such a release individually, by associating a bookmark to each updated item's history record. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. This is not desirable and should be avoided if at all possible. +It is possible bookmark observations in such a release individually, by associating a bookmark to each updated item's history record. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. Given the number of observations, this is highly undesirable and should be avoided whenever possible. #### Bracketing -Bracketing uses only two bookmarks to demarcate the beginning and end of a batch of updates: +Bracketing uses only two bookmarks per group; one each to demarcate the beginning and end of a group: - Let $U$ be the set of updates that add the new release. - Define bookmarks $B_1$ and $B_2$ to demarcate (bracket) the release. - Perform $Bracket(B_1, U, B_2)$. -Bracketing requires only constant time and space relative to the number of updates within it. It significantly improves both memory footprint and retrieval time. +Bracketing requires only constant time and space relative to the number of updates within it. + ##### Bracketing for large datasets (QA releases, other updates) -QA releases and other updates are expected to arrive in large batches. If it is not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations. +QA releases and other updates are expected to arrive in large batches. If it is not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations, even if it does scale linearly. (A 3+ order of magnitude reduction in associations is still a significant win.) + ##### Bracketing for regular ingestion (`crmprtd`) -Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and is linear, not constant space and time in the number of observations, but it is better than bookmarking one observation at a time. +Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and scales linearly in total observations, but it is better than bookmarking one observation at a time. -Considerations: +##### Considerations -- It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. Here is where the idea of a bookmark association including a temporal column (or a more general informational text column) would be useful. -- In some sense a fully normalized representation for this case includes: - - A single bookmark associated multiple times to a series of groups of observations. - - One part (column) of the association distinguishes bracket-start and bracket-end. +It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. This could be done with disciplined naming in plain text, but that can be error prone and hard to debug. + +- This is where extending a bookmark association record with one or more adjunct columns would be useful. +- A fully normalized representation for such extensions includes: + - A single bookmark can be associated multiple times to (groups of) observations. + - One adjunct (column) of the association distinguishes bracket-start and bracket-end. - Another part (column) of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for possible more general uses of "many groups labelled by a single bookmark". -- For more details, see Implementation notes below. +- For more details, see [[#Implementation notes]] notes below. ## Implementation notes We begin to see the outlines of a plausible implementation, as follows. + ### Tables **Table `bookmarks`** @@ -182,25 +216,24 @@ Questions: Q: Why separate association from bookmark proper? -A: +A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. - We likely want to bracket datasets ingested by `crmprtd` using the same bookmark. -- Nevertheless the normalization here seems a bit awkward, or rather to permit more than valid usage. Some further considering required. - -| Column | Type | Remarks | -| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | -| ? `group` | ?`timestamp`; ?`int`; ?`text` | For distinguishing multiple associations of the same bookmark. Specific usages: `crmprtd` ingestion; possibly for regular QA releases. Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | -| ? `aux_info` | `text` | Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that it is an integer. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | + +| Column | Type | Remarks | +| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | +| ? `group` | ?`int`; ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | +| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that it is an integer. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints: @@ -212,12 +245,12 @@ Questions: ### Functions, stored procedures -Since bookmarking is a non-trivial activity, it will be useful to encode it in stored procedures. There is some question of whether some or all of this should instead be utility Python code in the PyCDS repo proper and not SP's within the database, but we'll mix em all up here in one list. +Since bookmarking is a non-trivial activity, it will be useful to encapsulate it in code. There is some question of whether some or all of this should be utility Python code in the PyCDS repo proper vs. SP's within the database, but we'll mix 'em all up here in one list. 1. Create a bookmark association at current time (current state of database). 2. Check tuple validity. 3. Create a bookmark association from a past state (history tuple). Check validity of tuple. -4. Determine support (see discussion) of an observation. Result is a valid history tuple. Can then create bookmark association to it. +4. Determine support (see [[#Metadata support set]]) of an observation. Result is a valid history tuple. Can then create bookmark association to it. 5. Perform bracketing operation. 1. Create bookmark(s). 2. Create bracket-begin association. @@ -227,6 +260,7 @@ Since bookmarking is a non-trivial activity, it will be useful to encode it in s ### Triggers 1. Enforce values of `mod_time`, `mod_user` in `bookmarks` and `bookmark_associations`. (As for history tracking; reuse tf.) + ## History tuples, database subset, and validity Our goal here is to check whether a given set of history id's is valid, i.e., does it represent a real, consistent historical state of the database. This will be useful when attempting to define a bookmark association post hoc. From 2c3c9a61720acbed27f42e8b8cad2c9e17327232 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Tue, 26 Aug 2025 16:15:48 -0700 Subject: [PATCH 04/13] Rough draft --- docs/bookmarks.md | 433 +++++++++++++++++++++++++++++++++------------- 1 file changed, 309 insertions(+), 124 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index 27a1b8e7..bf7e9d58 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -1,6 +1,6 @@ # Bookmarks aka version tags aka commits aka ... -# Table of contents +## Table of contents - [Terminology](#terminology) - [Facts and assumptions](#facts-and-assumptions) @@ -28,9 +28,22 @@ - [Historical support, $Sh(X)$](#historical-support-shx) - [Current (or latest) support, $Sc(X)$](#current-or-latest-support-scx) - [Support at tag, `St(X,T)`](#support-at-tag-stxt) + +_TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ + ## Terminology -- **Bookmark**: A named object that points at a set of history records, one per history-tracked table. This definition conflates the notion of a bookmark proper and the association of it to history records, which actually are distinct, but it will suffice for now. +- **Bookmark**: A named object that designates a *point in history*. This is an imprecise usage of the term "bookmark", which is actually two related things, a *bookmark label* and a *bookmark association*: + +- **Bookmark label**: A data object bearing a label used for bookmarking. Several different points in history can be tagged with the same label. + +- **Bookmark association**: A data object that associates a *bookmark label* to a specific point in history. + +- **Point in history**: The current values of all items in history-tracked tables. It excludes non-history tracked tables, except as adjuncts to the current state. (Note that we cannot be sure what the non-history tracked tables might have contained in the past. Only the present is knowable with such tables.) + - **Current point in history**: The state of the history-tracked collections at the current point in time, in whatever sense "current" can be understood. + - **Past point in history**: An actual previous state of history-tracked collections. Such a state was the current point in history in the database at some moment, however briefly. + +- **History tuple**: A tuple of history id's, one for each history table, in some specified order of those tables. There are many possible tuples of history id's, but only some of them represent actual points in history. Those that do represent actual points in history are called (valid) history tuples. The rest are invalid and represent nothing useful. See [[#History tuples, historical subsets, and their validity]] for how to check tuple validity. ## Facts and assumptions @@ -38,98 +51,212 @@ - History tables are append-only. - Each history table records the changes made to the entire collection *in temporal order of the changes*. -- Each successive update to the collection is recorded by appending a record to the history table; therefore temporal order is also the order by ascending history id. +- Each successive update to a collection is recorded by appending a record to its history table; therefore temporal order is also the order by ascending history id. **Assumptions** -- No existing record in the history table is ever modified. +- No existing record in a history table is ever modified. **Therefore** -- If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. +- If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark association can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. - Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). **For further consideration** - Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. - - However, that's not true if we allow bookmarking of non-latest states, which is probably going to be desirable. We can't think ahead perfectly. Hmmm. + - However, that is not true if we allow bookmarking of non-latest states, which is probably going to be needed. We already have history, and we won't always anticipate future needs. Hmmm. - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These should be consistent across all history tables; verify this thinking. -## Basic bookmark operations +## History tuples, historical subsets, and their validity + +Our goal here is to check whether a given tuple of history id's is valid, i.e., does it represent a real point in time of the history collections. This will be useful when attempting to define a bookmark association after the fact. + +**Historical subset**: A history tuple defines a subset of the history, namely all those history records in each history table that occur at or before the corresponding history id in the tuple. Under the assumption (see above) that the temporal order of history records is the same as the history id order, "at or before" means that history id is less than or equal to the history id in the tuple. + +**Validity**: A historical subset is valid if and only if it exhibits referential integrity. That is, all references by a history record in that subset to another history record must also be found in the subset. Otherwise (i.e., when there is a violation of referential integrity within the subset) the historical subset is not valid. + +A history tuple is valid if and only if it implies a valid database historical subset. + +**Algorithms for checking tuple validity**: Given this definition, algorithms for checking tuple validity are straightforward: + +1. The naive algorithm checks every reference in any given subset for presence in the referenced collection historical subset. But this is a huge number of records in most cases. +2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. + +**Uses**: + +1. Creating a (valid) bookmark association post hoc. +2. It is possible that a history tuple may be presented for checkout (see below) that is not known to be valid. +3. It is also possible that a single point of a single collection may be presented and we wish to construct a valid database historical state from it. Validity criteria tell us to do this. + +## History operations + +We'll need a small handful of operations related directly to history records. These form the foundation for bookmark operations. + +### Validate a history tuple + +$Validate(S)$: Given a history tuple $S$, check that it is valid. Raise an error if it is not. + +### Given a set of history records, get the latest undeleted (LU) records + +Suppose we have a set of history records (from a single collection). This set might be the entire history of the collection, or it might be a subset of those records, perhaps those occurring before a given history id or a given modification time. + +Given such a set, we want to extract the *latest undeleted* (LU) history records, one for each item id present in the set, and subject to the condition that there is no history record that represents a deletion of that item in the set. + +The LU set represents what the collection looked like at the point in time represented by the set. Essentially, we ignore all the earlier history, and all the records that have been deleted up to that point, retaining only the latest undeleted ones. + +Let $H$ be such a set of history records. Then $LU(H)$ is a subset of $H$ such that: +- For each item id $i$ present in $H$ (recall that item id's are not unique in history records): + - there is at most one history record in $h_i \in LU(H)$ with item id $i$; + - $h_i$ has the largest history id of all history records in $H$ with item id $i$; + - that item has not been deleted ($h_i$ does not have the deleted flag set). + +##### Relation to historical subsets + + When we have a historical subset (defined above), then the collections of LU records from each history table in the subset give us the state of the history-tracked collections at the point in time represented by the historical subset. The main table at the point in time represented by the history subset is given by $LU(H$) for each corresponding history table $H$. + +##### Implementation + +It's straightforward to construct a query that yields the history id's of the LU items in a given collection. These history id's can then be used to extract part or all of the LU records from the history table. + +Here's a specific example for `meta_network_hx`. In this query, `` is the condition that extracts the subset from the full history table. We can generalize this query to any history table. + +``` +SELECT + network_id, + max(meta_network_hx_id) AS max_hx_id +FROM + meta_network_hx +WHERE +GROUP BY + network_id +HAVING + NOT bool_or(deleted) +``` + +The `` could look like one of the following: -We believe that the operation of "bookmark the current state" should be applied atomically across all history-tracked tables. That is, a bookmark does not associate to a single history table record, but rather to a contemporaneous set of records, one per history table, that represent a real point in time in the history of the database, and that this is an atomic operation. +- `meta_network_hx_id <= upper_bound` (full previous point in history) +- `lower_bound < meta_network_hx_id AND meta_network_hx_id <= upper_bound` (partial history between two previous points) +- `mod_time <= upper_bound` (full previous history bounded by when modifications were applied) +- etc. -Post-hoc bookmarking raises slightly more difficult issues, but the principle remains the same: each bookmark should represent a real point in the history of the database. +Again looking a little forward in this document, a common case will be where the condition is related to one or more bookmarks. +##### Other notes -### Create a bookmark +- Fidelity to actual history would require that there are no gaps in the set of history records, i.e., that we haven't arbitrarily dropped some from the middle. However, that is not strictly necessary for these operations to be performed.) +## Bookmark operations -**Motivation**: A bookmark is a data object. It is independent of the database states it is associated to. We need to be able to create arbitrary bookmarks. +### Create a bookmark label -**Operation**: The act of creating a new bookmark $B$, eliding the details of bookmark implementation, is denoted $B = CreateBookmark(N, ...)$ where $N, ...$ is the bookmark name and other details. +**Motivation**: A bookmark label is an object used to tag one or more points in history. It is independent of the point(s) in history it is associated to. -### Bookmark the database state (create a bookmark association) +**Operation**: The act of creating a new bookmark label $L$, eliding the details of bookmark implementation, is denoted $L = CreateBookmarkLabel(N, ...)$ where $N, ...$ is the bookmark name and other (elided) details. -**Motivation**: This is the fundamental operation in bookmarking. "Database state" means "an actual historical state of the database as it performs updates to history-tracked tables". Such a state is represented by a tuple of history table id's, one per history table. +### Bookmark a point in history (create a bookmark association) -- Such a tuple is automatically valid if it represents the current state of the database. -- We will very likely want to bookmark a *past* state of the database after the fact. In that case, we need to check that the tuple of history records associated to the bookmark actually represents a true past state, and not just an arbitrary and inconsistent selection of history records. For an answer to this, see [[#History tuples, database subset, and validity]]. +**Motivation**: This is the fundamental operation in bookmarking. -**Operation**: Let $B$ be a bookmark. Let $S$ be a valid state of the database, represented as a tuple of history id's. Then, eliding the details of the association data object: -- $Bookmark(B, S)$ denotes the atomic operation of associating bookmark $B$ to state $S$. -- The shorthand $Bookmark(B)$ is defined as $Bookmark(B, S)$ where $S$ is the current state of the database. +**Operation**: Let $L$ be a bookmark label. Let $S$ be a tuple of history id's. Then, eliding the details of the association data object: +- $B = Bookmark(L, S)$ denotes the atomic (transaction enclosed) operation of: + - validating $S$, and then + - associating bookmark label $L$ to state $S$ + - returning the bookmark association $B$. +- The shorthand $Bookmark(L)$ is defined as $Bookmark(L, S)$ where $S$ is the current state of the database. +***Note***: For information on validating a tuple of history id's, see [[#History tuples, historical subsets, and their validity]]. ### Bracket (or group) a set of updates **Motivation/scenario**: A set of related updates are received or made all at one time. The canonical case is a QA update of a large set of observations. (In other cases, e.g., when a scientist is updating things, it will take a certain amount of discipline to make sure that the updates are batched together like this.) -**Operation**: Let $B_1$ and $B_2$ be two bookmarks. Let $U$ be a set of updates. Then the operation $Bracket(B_1, U, B_2)$ is defined as: +**Operation**: Let $L_1$ and $L_2$ be two bookmarks. Let $U$ be a set of updates. Then the operation $Bracket(L_1, U, L_2)$ is defined as: - *Within a transaction (i.e., atomically)*: - - Perform $Bookmark(B_1)$. + - Perform $Bookmark(L_1)$. - Perform updates $U$. - - Perform $Bookmark(B_2)$. + - Perform $Bookmark(L_2)$. -Changes between $B_1$ (exclusive) and $B_2$ (inclusive) are exactly and only those changes made in the updates. This is due to: +History records between $L_1$ (exclusive) and $L_2$ (inclusive) are exactly and only those changes made in the updates. This is due to: -- their isolation in the update transaction (so no other operations interleaved); -- the fact that change records are appended to the history tables in temporal order of change operations; therefore the last change operation is recorded at the end of the relevant history table. +- their isolation in the transaction (so no other update operations are interleaved); +- the fact that change records are appended to the history tables in temporal order of change operations. **For further discussion and analysis**: -- Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark prefix $B$, and allow the system to construct bookmarks $B_1$ and $B_2$ from $B$. We can then define $Bracket(B, U) = Bracket(B_1, U, B_2)$, where $B_1$ and $B_2$ are the constructed bookmarks. (Update: See use of auxiliary columns in bookmark association table in [[#Implementation notes]] below.) +- Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark label $L$, and allow the system to construct bookmarks $L_1$ and $L_2$ from $L$. In fact, we use the same label, and the bookmark association carries the distinction between $L_1$ and $L_2$. We can then define $Bracket(L, U) = Bracket(L_1, U, L_2)$, where $L_1$ and $L_2$ are the constructed bookmarks. See use of auxiliary columns in bookmark association table in [[#Implementation notes]] below. ## Applications -### Database state reconstruction or Rollback +### Efficient grouping of data versions + +CRMP partners periodically release new versions of a dataset. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. + +Releases typically are, in time order: + +1. A raw dataset. This dataset frequently arrives incrementally, via `crmprtd`. +2. A QA-adjusted dataset. This dataset is expected to arrive in one or a few large batches. + +Each observation in the second release is a revision to an observation in the first release. +#### An unsatisfactory approach + +It is possible bookmark observations individually, by associating a bookmark to each updated item's history records. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. Given the number of observations, this is highly undesirable and should be avoided whenever possible. +#### Bracketing + +Bracketing uses only two bookmarks per group; one each to demarcate the beginning and end of a group: + +- Let $U$ be the set of updates that update the database for a given release. +- Define bookmarks $L_1$ and $L_2$ to demarcate (bracket) the release. +- Perform $Bracket(L_1, U, L_2)$. + +Bracketing requires only constant time and space relative to the number of updates within it. + +##### Bracketing for large datasets (QA releases, other updates) + +QA releases and other updates are expected to arrive in large batches. If it is not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations, even if it does scale linearly. (A 3+ order of magnitude reduction in associations is still a significant win.) + +##### Bracketing for regular ingestion (`crmprtd`) + +Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and does scale linearly in total observations, but it is nonetheless better than bookmarking one observation at a time. + +##### Considerations + +It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. This could be done with disciplined naming in plain text, but that can be error prone and hard to debug. + +- This is where extending a bookmark association record with one or more adjunct columns would be useful. +- A fully normalized representation for such extensions includes: + - A single bookmark label can be associated multiple times to (groups of) observations. + - One attribute of the bookmark association distinguishes bracket-start and bracket-end. + - Another attribute of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for more general cases of "many groups labelled by a single bookmark". Possibly this should be one or more of group index (non-null, integer), aux_info (nullable, text, which can encode a date or any other relevant information) +- For more details, see [[#Implementation notes]] notes below. + +### Historical reconstruction or Rollback #### Outline -The design of history tracking makes easy in concept to reconstruct the complete state of the database from a bookmark (or more precisely a bookmark association). This is in fact what bookmarking is for. +The design of history tracking makes it easy (although not necessarily *fast*) to reconstruct a point in history from a bookmark (or more precisely a bookmark association). In other terms, to produce the historical subset given a history tuple. -Q: How do we do that? -A: Query the latest state of each item in the collection whose history id is less than or equal to the id of the record the bookmark association points at. One such query is the following: For each collection (i.e., each history-tracked table) -``` -SELECT DISTINCT ON (collection_item_id) * -FROM collection_hx -WHERE collection_hx_id <= collection_hx_id_from_bookmark -ORDER BY collection_hx_id DESC -``` -returns the set of collection items that was current as of the bookmark. +We already have the required tools in the definition of LU (latest updated) records. Rollback is just the operation of generating these records and storing them somewhere. #### Implementation considerations -For metadata tables, which have few records, the above query is likely fast. For `obs_raw_hx`, it will scan a huge number of records. To make it perform better, further WHERE conditions may have to be added and possibly judicious indexing on the history tables. But see also below. +For metadata tables, which have few records, the basic LU query is likely fast. -Alternatively: Further to the problem of `obs_raw_hx` being enormous, and queries against it therefore taking very long times, here is a possibility, in which we create a separate rolled-back version of the database. +For `obs_raw_hx`, the LU query will necessarily scan a huge number of records. To make it perform better: + +- Create appropriate indexing on the history tables. +- Use the tightest possible WHERE conditions. If, for example, only observations from a specific network or from specific stations are desired, then encode that in the WHERE condition. + +For efficiency and convenience we are likely to want to store the result in a separate set of tables, which are best housed in their own separate schema. In a separate schema, call it `crmp_rollback`, do the following. Given a bookmark: -- Establish replica of the `crmp` schema (i.e., table definitions without data) including main tables but excluding history tables (history tables are redundant here). +- Establish structural copy of the `crmp` schema (i.e., table definitions without data) including main tables but excluding history tables (history tables are redundant here). - Include FK relationships, indexes, and other things as needed. - Duplicate the content of the non-history tracked tables (at the time of of this rollback). - Populate the history-tracked tables using queries as above. - Define and populate (one row) a rollback table that contains at least the following information: - - bookmark association id + - bookmark association ids, if relevant + - text of WHERE condition ... the bookmark association ids may not be relevant or the full story - timestamp when this rollback was established - id of user creating the rollback - any other important information not retrievable with this data @@ -138,7 +265,7 @@ In a separate schema, call it `crmp_rollback`, do the following. Given a bookmar And, in the main `crmp` schema, define a stored procedure that does all of the above: -- Given a bookmark id and rollback schema name +- Given a bookmark id (or more broadly a where condition) and rollback schema name - In a valid order (respecting foreign key dependencies) - With optimized queries insofar as possible @@ -146,63 +273,107 @@ Once the rollback schema is populated with data reflecting a given point in hist Any number of rollback schemas can be established. They do not interact with each other, nor with the live CRMP database. But because each rollback schema will be comparable in size to the live database, we may wish to limit their number and their lifetimes. -### Efficient grouping of data versions +### Extraction of specific subsets -CRMP partners periodically release new versions of a dataset. Such releases typically contain many thousands of observations. A version release amounts to an update to each such datum. +In some cases, it's possible that the (historical) records of interest lie only in a restricted range. This would be a much smaller dataset than the entire set of records in `obs_raw` at a given point in history. With suitable indexes, these subsets will be much faster to extract and work with than the whole of the history-tracked tables. Some examples: -Releases typically are, in time order: +- The data selected by a particular bookmark. This might be several groups or just one, depending on how the bookmark was used -- for example, depending on whether it bracketed just one group of data or several. +- The data selected by a particular date range. This would require extracting the (latest) historical records for that time period. This does not involve bookmarks. +- A combination of the above. -1. A raw dataset. This dataset frequently arrives incrementally, via `crmprtd`. -2. A QA-adjusted dataset. This dataset is expected to arrive in one or a few large batches. +#### Example 1: Records inserted or updated within a single bracketing -Each observation in the second release is a revision to an observation in the first release. -#### An unsatisfactory approach +This could correspond to a QA update to a previously ingested set of raw observations. -It is possible bookmark observations in such a release individually, by associating a bookmark to each updated item's history record. This requires the same number of bookmark associations as there are observations, i.e., it scales linearly with the number of observations. Given the number of observations, this is highly undesirable and should be avoided whenever possible. -#### Bracketing +Let `b_begin` and `b_end` represent tuples either externally provided history tuples (substituted in to the query), or tuples queried directly from the bookmark tables, corresponding to a bracket-begin and bracket-end pair. -Bracketing uses only two bookmarks per group; one each to demarcate the beginning and end of a group: +The bracketed set of updates may include multiple updates to a single item in a given collection. After the fact, we are only interested in the final outcome, which is the *latest* value for each collection item in the bracket. We must do a little extra work to obtain the latest indexes, which means taking the latest (equivalently, greatest) history id within the subset for each item id. -- Let $U$ be the set of updates that add the new release. -- Define bookmarks $B_1$ and $B_2$ to demarcate (bracket) the release. -- Perform $Bracket(B_1, U, B_2)$. +Here is the specific query for `meta_network_hx`. We can generalize to any history table from this example. -Bracketing requires only constant time and space relative to the number of updates within it. +``` +SELECT + obs_raw_id, + max(obs_raw_hx_id) AS max_obs_raw_hx_id +FROM + obs_raw_hx +WHERE b_begin.obs_raw_hx_id < obs_raw_hx.obs_raw_hx_id + AND obs_raw_hx.obs_raw_hx_id <= b_end.obs_raw_hx_id +GROUP BY + obs_raw_hx.obs_raw_id +HAVING + NOT bool_or(obs_raw_hx.deleted) +``` -##### Bracketing for large datasets (QA releases, other updates) +These history id's, `max_obs_raw_hx_id`, select the latest records in the bracketed set. -QA releases and other updates are expected to arrive in large batches. If it is not operationally possible to perform the updates for a new release within a single transaction, this approach can generalized to a small number of bracketing operations which together encompass the whole release. This will still significantly reduce the space and time required to bracket and retrieve a large group of observations, even if it does scale linearly. (A 3+ order of magnitude reduction in associations is still a significant win.) +##### Caution! Bracketed sets vs valid historical subsets -##### Bracketing for regular ingestion (`crmprtd`) +It's tempting to think that the collection of records selected from each history-tracked collection by the above process would jointly constitute a valid historical subset. Not so! -Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and scales linearly in total observations, but it is better than bookmarking one observation at a time. +Let's consider what is probably the most common and germane example, updates to `obs_raw`. -##### Considerations +Suppose the bracketed updates were *only* to `obs_raw`, a not unlikely scenario. Then the collection of history records obtained by the above process, from all history tables, contains only records from `obs_raw_hx`. There are no metadata records whatsoever in this set, because none were modified. But those `obs_raw_hx` records necessarily point at metadata history records ... which are not in the set. -It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. This could be done with disciplined naming in plain text, but that can be error prone and hard to debug. +What's going on here? The historical metadata supporting the observations is drawn from the entire set of latest records prior to the bracket-end bookmark, potentially as far back as the first record ever inserted. + +It's important to keep this slightly subtle point in mind when working with brackets or other historical subsetting. + +##### Question: Which supporting metadata? + +Again, considering brackets with updates only to `obs_raw` will bring things into sharper focus. + +We have three different plausible choices for the metadata supporting these observations: + +1. The historical metadata directly linked to each `obs_raw_hx` record within the bracket. +2. The *latest* version, *within the subset implied by the bracket-end*, of the metadata linked to each `obs_raw_hx`. +3. The *current* version of the metadata item linked to each `obs_raw_hx` record within the bracket. This is not constrained by the bracket-end. Therefore, unlike the above two records, it can vary as time passes, i.e., as further updates to the linked records are made. Those updates can include deletion, so it is doubly perilous to consider using this choice. We cannot recommend it. + +The correct choice depends on context and intention, although (3) is highly questionable. There is no universally correct choice. +#### Example 2: Records inserted or updated in a specific time period + +We can also form a subset based on time constraints. This is not fundamentally different, but there are some additional or sharpened considerations. + +Let `t_begin` and `t_end` be timestamps defining the time period of interest. We want to extract the subset of records that were inserted or updated in this period. + +Here is the specific query for `obs_raw_hx`, which is the most likely target for this kind of query. We can generalize to any history table from this example. + +``` +SELECT + obs_raw_id, + max(obs_raw_hx_id) AS max_obs_raw_hx_id +FROM + obs_raw_hx +WHERE t_begin < mod_time AND mod_time <= t_end +GROUP BY + obs_raw_id +HAVING + NOT bool_or(deleted) +``` + +All three considerations described above about what records are germane in the subset are important here: + +1. The process of obtaining the latest historical value for each item selected within the time period. Even more so than with a bracket, the collection items selected within a specific time period may experience multiple updates. +2. Time period constraints do not provide any guarantee that the history records associated to any within the temporal subset are also within that subset. And still less if the time constraints are selected for one particular collection and not with all collections in mind. + 1. [[#Caution! Bracketed sets vs valid historical subsets]] is relevant, but with the time constraints playing the role of the brackets. This is perhaps even more pointed because of the lack of the sanity guaranteed by a bookmark's validity constraint. + 2. [[#Question Which supporting metadata?]] This translates over pretty much unchanged. -- This is where extending a bookmark association record with one or more adjunct columns would be useful. -- A fully normalized representation for such extensions includes: - - A single bookmark can be associated multiple times to (groups of) observations. - - One adjunct (column) of the association distinguishes bracket-start and bracket-end. - - Another part (column) of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for possible more general uses of "many groups labelled by a single bookmark". -- For more details, see [[#Implementation notes]] notes below. ## Implementation notes -We begin to see the outlines of a plausible implementation, as follows. +We begin to see the outlines of an implementation, as follows. ### Tables **Table `bookmarks`** -| Column | Type | Remarks | -| -------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_id` | `int` | PK | -| `name` | `text` | | -| ? `comment` | `text` | Meaning? Utility? | -| ? `network_id` | `int` | FK `meta_network`. Utility is to distinguish bookmarks for one network from another, and allow a simple, natural name in common, such as 'QA'. Normalizes a common use case, I think. Tempting to make nullable, but caution, nullable columns have frequently been abused in CRMP. | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +| Column | Type | Remarks | +| -------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_id` | `int` | PK | +| `name` | `text` | | +| `comment` | `text` | Elaboration of meaning or use of the bookmark. Example: "QA release 2021" | +| ? `network_id` | `int` | FK `meta_network`. Use is to distinguish bookmarks for one network from another, and allow a simple, natural name in common, such as 'QA'. Normalizes a common use case, I think. Tempting to make nullable, but caution, nullable columns have frequently been abused in CRMP. | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints @@ -211,33 +382,33 @@ Constraints Questions: 1. Apply history tracking to this table? Reason, utility? +2. Is `network_id` actually needed? A network is implied by a bookmark's association to items. This is also true of variables, but they too have a sdirect network association (`network_id`) that is not strictly necessary. That fact inspired the idea of having a direct association for bookmarks as well. The slight over-specificity may be offset by the utility of easily segregating these things (variables, bookmarks) by network. **Table `bookmark_associations`** Q: Why separate association from bookmark proper? - A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. -- We likely want to bracket datasets ingested by `crmprtd` using the same bookmark. - -| Column | Type | Remarks | -| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| ? `type` | ?`enumeration`; ?`int` (FK to type table); ?`text`. Values: `'bookmark'`, `'bracket-begin'`, `'bracket-end'`. | Still some question to the wisdom of encoding this aspect of bookmark usage with a separate column. If we do, enumeration type might be best. | -| ? `group` | ?`int`; ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | -| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that it is an integer. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +- We likely want to bracket multiple groups of observations -- e.g., those ingested by `crmprtd` at any one time -- using the same bookmark. + +| Column | Type | Remarks | +| ------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| `role` | ?`enumeration`; ?`int` (FK to role table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | +| ? `group` | ?`int`; ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | +| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that `group` is an integer. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints: -- unique (`bookmark_id`, `type`, `group`) +- unique (`bookmark_id`, `role`, `group`) Questions: @@ -245,7 +416,7 @@ Questions: ### Functions, stored procedures -Since bookmarking is a non-trivial activity, it will be useful to encapsulate it in code. There is some question of whether some or all of this should be utility Python code in the PyCDS repo proper vs. SP's within the database, but we'll mix 'em all up here in one list. +Since bookmarking is a non-trivial activity, it will be useful to encapsulate its operations in code. There is some question of whether some or all of this should be Python code in the PyCDS repo proper vs. SP's within the database, but we mix 'em all up here in one list. 1. Create a bookmark association at current time (current state of database). 2. Check tuple validity. @@ -261,34 +432,9 @@ Since bookmarking is a non-trivial activity, it will be useful to encapsulate it 1. Enforce values of `mod_time`, `mod_user` in `bookmarks` and `bookmark_associations`. (As for history tracking; reuse tf.) -## History tuples, database subset, and validity - -Our goal here is to check whether a given set of history id's is valid, i.e., does it represent a real, consistent historical state of the database. This will be useful when attempting to define a bookmark association post hoc. - -**History tuples**: We can regard a bookmark association as a tuple of collection history id's, one per collection (hereafter a history id tuple, history tuple, or just tuple). - -**Database history subset**: A history tuple defines a subset of the database history, namely all those history records in each history table that occur before the corresponding history id in the tuple. Under the reasonable assumption (see above) that the temporal order of history records is the same as the history id order, "before" here means that history id is less than or equal to the history id in the tuple. - -**Validity**: Not all such database subsets, and therefore such tuples, are valid. - -The criterion for validity is essentially referential integrity. That is, within the subset of history records implied by the tuple, all references by a history record in that subset to another history record must also be found in the subset. Otherwise (i.e., when there is a violation of referential integrity within the subset) the database history subset is not valid. - -History id tuples are valid iff they imply a valid database history subset. - -**Algorithms for checking tuple validity**: Given this definition, algorithms for checking tuple validity are straightforward: - -1. The naive algorithm checks every reference in any given subset for presence in the referenced collection history subset. But this is a huge number of records in most cases. -2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. - -**Applications**: - -1. Creating a (valid) bookmark association post hoc. -2. It is possible that a history tuple may be presented for checkout (see below) that is not known to be valid. -3. It is also possible that a single point of a single collection may be presented and we wish to construct a valid database historical state from it. Validity criteria allow us to do this. - ## Metadata support set -**Note/TODO**: This section may not be all that useful any more ... but I include it for consideration. It may also be overcomplicated ... the support of a set of observations may be more general than is really useful. It might be better to consider the support of only the earliest and latest records in the set, since those effectively bracket the group of observations. But, post-hoc, i.e., non-atomically, that bracketing is too large, so we will need to look at some notion of contiguous groups if that is possible. Oy vey, more work to do here. +***Note/TODO***: This section may not be very useful any more ... but I include it for consideration. It may also be overcomplicated ... the support of a set of observations may be more general than is really useful. The idea of the "metadata support" may prove useful in talking clearly about bookmarking. In particular, it may prove useful in discussing bookmarking or bracketing a set of observations post hoc. From here on, we may abbreviate "metadata support" to "support". @@ -327,8 +473,47 @@ Is this still relevant? It seems that if we define bookmarking as an association A slightly less self-evident case of support -- The support set of of observation history record $X$ at tag `T`, denoted `St(X,T)`, is the set of metadata history records defined by: For each record in $Sh(X)$, use the metadata history record tagged by `T` for that metadata item. +- The support set of observation history record $X$ at tag `T`, denoted `St(X,T)`, is the set of metadata history records defined by: For each record in $Sh(X)$, use the metadata history record tagged by `T` for that metadata item. - There may be no such metadata history record for some or all of the elements of $Sc(X)$. Therefore `St(X,T)` may not contain one item for every metadata record type. - Tag `T` can tag *any* metadata history record in an item's history set. Therefore the elements of `St(X,T)` may occur *before* the historical support items for $X$. This may or may not make sense in any given context. -It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. \ No newline at end of file +It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. + +## Points in history + +***This discussion now appears irrelevant*** + +### Ordering of updates + +***Caution***: It is tempting -- and sometimes useful -- to think of a point in history in terms of the state of a single item in a collection. It must be borne in mind, however, that there are many items in any given collection and so a point in history is actually a large set, not just one item. + +History advances one change at a time. Updates to records are essentially serial. *Is that true in all contexts, e.g., in the context of a transaction?* + +Let's consider a simplified situation: An observations collection O and a metadata collection M to which observations are linked. + +A given point P in history comprises +- The states of items in O. +- The states of items in M. + +At point P in history, a read query is made to the main tables (which is the only case we are considering here; main tables are the primary interface). + +Only the states of items in O and items in M determine the result of the query at point P. How they got to those states -- their past histories -- are irrelevant to the result, except in cases where modification time is part of the query. Let's set that case aside for the moment. + +Consider a single item o in O and the item m in M that it links to. The value (state) of o and of m could have been reached by two different histories, i.e., two different orderings between the most recent two updates to o and h: + +1. Update o, then update m. +2. Update m, then update o. + +This extends to all items in O and all items in M: their joint states can be the product of many different orderings of updates. + +Upshot: Any given point in history can be reached by many different sequences of updates. + +### Subsequent points in history + +OK, so what? + +Let $P$, $Q$, and $R$ be a time-ordered sequence of points in history. That is, $P$ precedes $Q$ precedes $R$ in temporal order. Let the updates from $P \to Q$ be denoted $U_Q$, and the updates from $Q \to R$ be denoted $U_R$. + +Then at point $R$, any interleaving of updates in $U_Q$ and $U_R$ are equivalent (neglecting mod time), so long as the ordering of updates to any single item is preserved. In short, it doesn't matter how we got to $R$, so long as we don't reorder updates to the same item. + +This is relevant when considering what points to bookmark, and what bookmarks to choose as a rollback point. \ No newline at end of file From 9d6c43377eb5f57e1a404dd7e7a94a213efc5612 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Wed, 27 Aug 2025 10:51:29 -0700 Subject: [PATCH 05/13] WIP --- docs/bookmarks.md | 69 +++++++++++++++++++++++++--------------- pycds/orm/tables.py | 77 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+), 25 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index bf7e9d58..f8e22d9c 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -205,7 +205,7 @@ It is possible bookmark observations individually, by associating a bookmark to Bracketing uses only two bookmarks per group; one each to demarcate the beginning and end of a group: - Let $U$ be the set of updates that update the database for a given release. -- Define bookmarks $L_1$ and $L_2$ to demarcate (bracket) the release. +- Define bookmarks $L_1$ and $L_2$ to demarcate (bracket) the release. Below, this amounts to using bracket-begin and bracket-end in the association of the same bookmark label. - Perform $Bracket(L_1, U, L_2)$. Bracketing requires only constant time and space relative to the number of updates within it. @@ -226,7 +226,8 @@ It will be useful for the bookmarks used to bracket each ingestion to bear a cle - A fully normalized representation for such extensions includes: - A single bookmark label can be associated multiple times to (groups of) observations. - One attribute of the bookmark association distinguishes bracket-start and bracket-end. - - Another attribute of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for more general cases of "many groups labelled by a single bookmark". Possibly this should be one or more of group index (non-null, integer), aux_info (nullable, text, which can encode a date or any other relevant information) + - Another attribute of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for more general cases of "many groups labelled by a single bookmark". Possibly this should be one or more of group index (non-null, integer), aux_info (nullable, text, which can encode a date or any other relevant information). + - Regarding grouping, I've rethought this and it seems as if it is redundant to other information available (namely, observation time, ingestion mod time/history id, bookmarking mod time/history id) and also potentially very confusing. - For more details, see [[#Implementation notes]] notes below. ### Historical reconstruction or Rollback @@ -366,14 +367,14 @@ We begin to see the outlines of an implementation, as follows. **Table `bookmarks`** -| Column | Type | Remarks | -| -------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_id` | `int` | PK | -| `name` | `text` | | -| `comment` | `text` | Elaboration of meaning or use of the bookmark. Example: "QA release 2021" | -| ? `network_id` | `int` | FK `meta_network`. Use is to distinguish bookmarks for one network from another, and allow a simple, natural name in common, such as 'QA'. Normalizes a common use case, I think. Tempting to make nullable, but caution, nullable columns have frequently been abused in CRMP. | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +| Column | Type | Remarks | +| -------------- | ----------- | ------------------------------------------------------------------------- | +| `bookmark_id` | `int` | PK | +| `name` | `text` | | +| `comment` | `text` | Elaboration of meaning or use of the bookmark. Example: "QA release 2021" | +| ? `network_id` | `int` | FK `meta_network`. | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints @@ -382,7 +383,9 @@ Constraints Questions: 1. Apply history tracking to this table? Reason, utility? -2. Is `network_id` actually needed? A network is implied by a bookmark's association to items. This is also true of variables, but they too have a sdirect network association (`network_id`) that is not strictly necessary. That fact inspired the idea of having a direct association for bookmarks as well. The slight over-specificity may be offset by the utility of easily segregating these things (variables, bookmarks) by network. +2. FK `network_id` + 1. Is it actually needed? A network is implied by a bookmark's association to items. This is also true of variables, but they too have a indirect network association (`network_id`) that is not strictly necessary. That fact inspired the idea of having a direct association for bookmarks as well. This over-specificity (really: denormalization) may be offset by the utility of easily segregating these things (variables, bookmarks) by network, and establishing their relationship to network *before* use elsewhere. + 2. Nullable? Tempting, but caution, nullable columns have frequently been abused in CRMP. **Table `bookmark_associations`** @@ -391,20 +394,20 @@ A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. - We likely want to bracket multiple groups of observations -- e.g., those ingested by `crmprtd` at any one time -- using the same bookmark. -| Column | Type | Remarks | -| ------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| `role` | ?`enumeration`; ?`int` (FK to role table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | -| ? `group` | ?`int`; ?`timestamp`; ?`text` | For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | -| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that `group` is an integer. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +| Column | Type | Remarks | +| ------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| `role` | ?`enumeration`; ?`int` (FK to role table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | +| ? `group` | ?`int`; ?`timestamp`; ?`text` | DEPRECATED For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | +| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that `group` is an integer. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints: @@ -414,6 +417,22 @@ Questions: 1. Apply history tracking to this table? Reason, utility? +Constraints on bookmark associations and role: +- We are planning to enforce uniqueness of (`bookmark_id`, `role`, `group`). This may not be sufficient to ensure sanity. +- What we really need to do is to enforce the ordering of uses of `role`. This really needs to require strict, ordered pairing of bracket-begin and bracket-end, with single only allowed outside pairs. Grammar for this would be `(single* (bracket_begin bracket_end)*)*` +- But, under what ordering of associations? We need to accommodate post-hoc associations. Duh ... under order of history id ... or equivalently mod_time. +- With that decided, it will be interesting to formulate a query that can evaluate the condition. The condition can be restated as: + - `single` can be followed by `single` or `bracket_begin` + - `bracket_begin` can be followed by `bracket_end` + - `bracket_end` can be followed by `single` or `bracket_begin` +- If a bookmark association is proposed to be added at history id `hx_id` , then find the most recent (largest history id) bookmark association with this label prior to hx_id and apply the above rules. This will have to be evaluated for all tables. + +Rethinking this: +- Does `group` have any utility at all? We have observation time, ingestion mod time/history id (in each history table associated to), bookmarking mod time/history id. +- So what if bookmarks overlap? What if we want it that way? Caveat user? +- We don't really have much knowledge at this about desirable or undesirable uses of bookmarks, and yet here we are trying to constrain their use. +- Leaving this unconstrained, and eliminating the likely useless and confusing `group` seems like the best way to go at the moment. + ### Functions, stored procedures Since bookmarking is a non-trivial activity, it will be useful to encapsulate its operations in code. There is some question of whether some or all of this should be Python code in the PyCDS repo proper vs. SP's within the database, but we mix 'em all up here in one list. diff --git a/pycds/orm/tables.py b/pycds/orm/tables.py index 531fc939..96f3b9e3 100644 --- a/pycds/orm/tables.py +++ b/pycds/orm/tables.py @@ -29,12 +29,14 @@ String, Date, Index, + Enum as EnumType, ) from sqlalchemy import DateTime, Boolean, ForeignKey, Numeric, Interval from sqlalchemy.orm import relationship, synonym, declarative_base from sqlalchemy.schema import UniqueConstraint from sqlalchemy.schema import CheckConstraint from geoalchemy2 import Geometry +from enum import Enum from sqlalchemy.dialects.postgresql import CITEXT as CIText @@ -607,3 +609,78 @@ class DerivedValue(Base): name="obs_derived_value_time_place_variable_unique", ), ) + + +class BookmarkLabel(Base): + """ + A bookmark label is a named object that can be associated to one or more history + tuples. + + Every bookmark label belongs to a network. Together with the uniqueness constraint + on (name, network_id), this enables likely common bookmark names to be reused across + networks but not collide with each other. We will want to insist that name is unique, + and partitioning by network seems like an easy sanity-maintaining measure. + TODO: Review this decision. + """ + + __tablename__ = "bookmark_labels" + + bookmark_label_id = Column(Integer, primary_key=True) + network_id = Column(Integer, ForeignKey("meta_network.network_id"), nullable=False) + label = Column(String, nullable=False) + comment = Column(String) + + # NB: These values enforced by trigger functions + mod_time = Column(DateTime, nullable=False, server_default=func.now()) + mod_user = Column( + String(64), nullable=False, server_default=literal_column("current_user") + ) + + UniqueConstraint("network_id", "name"), + + +class BookmarkAssociationRole(Enum): + """The SQL enumeration type for the `role` attribute of `BookmarkAssociation`. + Note that only the names of the class elements are persisted, not the values, + which are arbitrary. We've chosen here to use the same element names as values. + + See + https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Enum + for more info.""" + + single = 'single' + bracket_begin = 'bracket_begin' + bracket_end = 'bracket_end' + + +class BookmarkAssociation(Base): + """ + A bookmark association associates a bookmark label to a tuple of history id's. + When we say "bookmark this point in history", we mean: create such an association + with a specified bookmark label. + + An association includes a bookmark label, the role of the label in the association, + and the group (better name needed) of the association + """ + + __tablename__ = "bookmark_associations" + + bookmark_association_id = Column(Integer, primary_key=True) + bookmark_label_id = Column(Integer, ForeignKey("bookmark_labels.bookmark_label_id"), nullable=False) + role = Column(EnumType(BookmarkAssociationRole), nullable=False) + comment = Column(String) + + # History tuple + obs_raw_hx_id = Column(Integer, ForeignKey("obs_raw_hx.obs_raw_hx_id"), nullable=False) + meta_network_hx_id = Column(Integer, ForeignKey("meta_network_hx.meta_network_hx_id"), nullable=False) + meta_station_hx_id = Column(Integer, ForeignKey("meta_station_hx.meta_station_hx_id"), nullable=False) + meta_history_hx_id = Column(Integer, ForeignKey("meta_history_hx.meta_history_hx_id"), nullable=False) + meta_vars_hx_id = Column(Integer, ForeignKey("meta_vars_hx.meta_vars_hx_id"), nullable=False) + + # NB: These values enforced by trigger functions + mod_time = Column(DateTime, nullable=False, server_default=func.now()) + mod_user = Column( + String(64), nullable=False, server_default=literal_column("current_user") + ) + + From 30e4d7e1b77c87abb6d91f47458a181b378cd25c Mon Sep 17 00:00:00 2001 From: rod-glover Date: Wed, 27 Aug 2025 14:49:41 -0700 Subject: [PATCH 06/13] More WIP, thinking through bracket matching --- docs/bookmarks.md | 55 +++++++++++++++++++++++++++---------- pycds/orm/tables.py | 67 ++++++++++++++++++++++++++++++--------------- 2 files changed, 85 insertions(+), 37 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index f8e22d9c..87eb3498 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -394,24 +394,26 @@ A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. - We likely want to bracket multiple groups of observations -- e.g., those ingested by `crmprtd` at any one time -- using the same bookmark. -| Column | Type | Remarks | -| ------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| `role` | ?`enumeration`; ?`int` (FK to role table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | -| ? `group` | ?`int`; ?`timestamp`; ?`text` | DEPRECATED For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`). | -| ? `aux_info` | `text` | Nullable.
Auxiliary information about the association. Largely motivated by the desire to expand on the meaning of `group`, especially in the case that `group` is an integer. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | +| Column | Type | Remarks | +| ------------------------- | ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| `role` | ?`enumeration`; ?`int` (FK to role value table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | +| `bracket_id` | `int` | See discussion below. | +| ~~? `group`~~ | ~~?`int`; ?`timestamp`; ?`text`~~ | ~~DEPRECATED For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`).~~ | +| `comment` | `text` | Nullable.
Auxiliary information about the association. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | +TODO: Finish off rethinking below and adjust all documentation as necessary. Constraints: -- unique (`bookmark_id`, `role`, `group`) +- ~~unique (`bookmark_id`, `role`, `group`)~~ (see below) Questions: @@ -433,6 +435,29 @@ Rethinking this: - We don't really have much knowledge at this about desirable or undesirable uses of bookmarks, and yet here we are trying to constrain their use. - Leaving this unconstrained, and eliminating the likely useless and confusing `group` seems like the best way to go at the moment. +But ... +- If you want to allow overlapping bookmarks (but nested still can be handled automatically, see below), then you need some way to pair bracket-begin and bracket-end. This is the role of some notion such as group. There is no other use for such an attribute. Therefore it should be named something like "bracket-id", and suitable constraints established on it (fingers crossed it's not very complex). +- Constraints: + - Bracket id is not relevant and therefore should be null, or should have a fixed value, say 0, for role single. + - There can be no more than one pair of bracket-begin bracket-end with the same bracket id. (We could write this `[id updates id]`, e.g., `[2 updates 2]` for short.) + - Bracket-begin with id $n$ is permitted if and only + - there is no other bracket-begin with id $n$ + - Bracket-end with id $n$ is permitted if and only if there is + - a bracket-begin with id $n$ + - no bracket-end with id $n$ + - We could provide a default, auto-incremented value for bracket id when the user does not set it, but: + - That invites some problems if the user is not aware of the attribute or doesn't grok how to use it. But see OTOH below for a way to minimize these problems. Essentially bracket-id is an auto-generated handle that the user should carry around. + - If there are several unpaired bracket-begins, then what bracket id to apply to the next unspecified bracket-end is ambiguous. + - Probably should make bracket-id not nullable for brackets. + - OTOH, could + - Return an auto-generated bracket-id from the create-association operation with bracket-begin and no bracket id specified. + - Auto-generate bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surpus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. + - Raise an error if more than one open bracket. +- Note that the need for bracket id falls away if we require strict nesting of brackets. Then the nesting rules tell you exactly how the bracket pairs match up. That excludes non-nested overlap, which it is conceivable may be wanted (for what??). We don't have enough info to exclude it absolutely. +- Also, while the nesting rule is easy enough to formulate (as a CFG, for example), it is more complex to enforce in code, which would require a small parser and have to allow for open brackets. So perhaps bracket id is, in addition to being more flexible, also simpler. +- Any automatic value generation for bracket id will have to be done by a trigger function. Generated columns are not permitted to reference any other row than the one being modified. +- Jesus, this whole thing can be simplified down to bracket id = bookmark association id, and whenever that is specified, it must be for a bracket-end matching to a bracket-begin. All other cases must be null. + ### Functions, stored procedures Since bookmarking is a non-trivial activity, it will be useful to encapsulate its operations in code. There is some question of whether some or all of this should be Python code in the PyCDS repo proper vs. SP's within the database, but we mix 'em all up here in one list. diff --git a/pycds/orm/tables.py b/pycds/orm/tables.py index 96f3b9e3..b77f1b25 100644 --- a/pycds/orm/tables.py +++ b/pycds/orm/tables.py @@ -617,7 +617,7 @@ class BookmarkLabel(Base): tuples. Every bookmark label belongs to a network. Together with the uniqueness constraint - on (name, network_id), this enables likely common bookmark names to be reused across + on (label, network_id), this enables likely common bookmark names to be reused across networks but not collide with each other. We will want to insist that name is unique, and partitioning by network seems like an easy sanity-maintaining measure. TODO: Review this decision. @@ -630,57 +630,80 @@ class BookmarkLabel(Base): label = Column(String, nullable=False) comment = Column(String) - # NB: These values enforced by trigger functions + # NB: The following values are enforced (overridden) by trigger functions. + # Defaults provided here are more documentation than anything. mod_time = Column(DateTime, nullable=False, server_default=func.now()) mod_user = Column( String(64), nullable=False, server_default=literal_column("current_user") ) - UniqueConstraint("network_id", "name"), + UniqueConstraint("network_id", "label"), class BookmarkAssociationRole(Enum): """The SQL enumeration type for the `role` attribute of `BookmarkAssociation`. + For more on the meanings and use of association roles, see README documentation. + Note that only the names of the class elements are persisted, not the values, which are arbitrary. We've chosen here to use the same element names as values. - - See + SQLAlchemy doc on enum: https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Enum - for more info.""" + See also this SO post for usage with Alembic: + https://stackoverflow.com/a/73922844 + """ - single = 'single' - bracket_begin = 'bracket_begin' - bracket_end = 'bracket_end' + single = "single" + bracket_begin = "bracket_begin" + bracket_end = "bracket_end" class BookmarkAssociation(Base): """ A bookmark association associates a bookmark label to a tuple of history id's. When we say "bookmark this point in history", we mean: create such an association - with a specified bookmark label. + using a given bookmark label. - An association includes a bookmark label, the role of the label in the association, - and the group (better name needed) of the association + An association includes a bookmark label and the role of the label in the association. + For more on the meanings and use of association roles, see README documentation. """ __tablename__ = "bookmark_associations" bookmark_association_id = Column(Integer, primary_key=True) - bookmark_label_id = Column(Integer, ForeignKey("bookmark_labels.bookmark_label_id"), nullable=False) + bookmark_label_id = Column( + Integer, ForeignKey("bookmark_labels.bookmark_label_id"), nullable=False + ) role = Column(EnumType(BookmarkAssociationRole), nullable=False) + # bracket_begin_id matches a bracket-end to a bracket-begin. It must be non-null + # if and only if role == bracket_end, and in that case it must be the id of a + # bracket_begin association that is not already matched. This condition is enforced + # by a trigger function, which also provides a value in the case that there is + # exactly one open bracket and a value is not explicitly specified. + bracket_begin_id = Column( + Integer, ForeignKey("bookmark_associations.bookmark_association_id") + ) comment = Column(String) - # History tuple - obs_raw_hx_id = Column(Integer, ForeignKey("obs_raw_hx.obs_raw_hx_id"), nullable=False) - meta_network_hx_id = Column(Integer, ForeignKey("meta_network_hx.meta_network_hx_id"), nullable=False) - meta_station_hx_id = Column(Integer, ForeignKey("meta_station_hx.meta_station_hx_id"), nullable=False) - meta_history_hx_id = Column(Integer, ForeignKey("meta_history_hx.meta_history_hx_id"), nullable=False) - meta_vars_hx_id = Column(Integer, ForeignKey("meta_vars_hx.meta_vars_hx_id"), nullable=False) + # History tuple. Every history table must be included here. + obs_raw_hx_id = Column( + Integer, ForeignKey("obs_raw_hx.obs_raw_hx_id"), nullable=False + ) + meta_network_hx_id = Column( + Integer, ForeignKey("meta_network_hx.meta_network_hx_id"), nullable=False + ) + meta_station_hx_id = Column( + Integer, ForeignKey("meta_station_hx.meta_station_hx_id"), nullable=False + ) + meta_history_hx_id = Column( + Integer, ForeignKey("meta_history_hx.meta_history_hx_id"), nullable=False + ) + meta_vars_hx_id = Column( + Integer, ForeignKey("meta_vars_hx.meta_vars_hx_id"), nullable=False + ) - # NB: These values enforced by trigger functions + # NB: The following values are enforced (overridden) by trigger functions. + # Defaults provided here are more documentation than anything. mod_time = Column(DateTime, nullable=False, server_default=func.now()) mod_user = Column( String(64), nullable=False, server_default=literal_column("current_user") ) - - From cd17faaeedf946e575da60e9429320d327f38162 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Wed, 27 Aug 2025 17:24:37 -0700 Subject: [PATCH 07/13] Simplify definitions, draft ORM definition --- docs/bookmarks.md | 258 ++++++++++++-------------------------------- pycds/orm/tables.py | 2 +- 2 files changed, 67 insertions(+), 193 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index 87eb3498..ccedce51 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -4,25 +4,27 @@ - [Terminology](#terminology) - [Facts and assumptions](#facts-and-assumptions) - - [Basic bookmark operations](#basic-bookmark-operations) - - [Create a bookmark](#create-a-bookmark) - - [Bookmark the database state (create a bookmark association)](#bookmark-the-database-state-create-a-bookmark-association) + - [History operations](#history-operations) + - [Validate a history tuple](#validate-a-history-tuple) + - [Given a set of history records, get the latest undeleted (LU) records](#given-a-set-of-history-records-get-the-latest-undeleted-lu-records) + - [Bookmark operations](#bookmark-operations) + - [Create a bookmark label](#create-a-bookmark-label) + - [Bookmark a point in history (create a bookmark association)](#bookmark-a-point-in-history-create-a-bookmark-association) - [Bracket (or group) a set of updates](#bracket-or-group-a-set-of-updates) - [Applications](#applications) - - [Database state reconstruction or Rollback](#database-state-reconstruction-or-rollback) - - [Outline](#outline) - - [Implementation considerations](#implementation-considerations) - [Efficient grouping of data versions](#efficient-grouping-of-data-versions) - [An unsatisfactory approach](#an-unsatisfactory-approach) - [Bracketing](#bracketing) - - [Bracketing for large datasets (QA releases, other updates)](#bracketing-for-large-datasets-qa-releases-other-updates) - - [Bracketing for regular ingestion (`crmprtd`)](#bracketing-for-regular-ingestion-crmprtd) - - [Considerations](#considerations) + - [Historical reconstruction or Rollback](#historical-reconstruction-or-rollback) + - [Outline](#outline) + - [Implementation considerations](#implementation-considerations) + - [Extraction of specific subsets](#extraction-of-specific-subsets) + - [Example 1: Records inserted or updated within a single bracketing](#example-1-records-inserted-or-updated-within-a-single-bracketing) + - [Example 2: Records inserted or updated in a specific time period](#example-2-records-inserted-or-updated-in-a-specific-time-period) - [Implementation notes](#implementation-notes) - [Tables](#tables) - [Functions, stored procedures](#functions-stored-procedures) - [Triggers](#triggers) - - [History tuples, database subset, and validity](#history-tuples-database-subset-and-validity) - [Metadata support set](#metadata-support-set) - [Definitions](#definitions) - [Historical support, $Sh(X)$](#historical-support-shx) @@ -35,16 +37,31 @@ _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ - **Bookmark**: A named object that designates a *point in history*. This is an imprecise usage of the term "bookmark", which is actually two related things, a *bookmark label* and a *bookmark association*: -- **Bookmark label**: A data object bearing a label used for bookmarking. Several different points in history can be tagged with the same label. +- **Bookmark label**: A data object bearing a label used for bookmarking. Several different points in history can be tagged with the same label. This allows a common label to be used to group multiple items. -- **Bookmark association**: A data object that associates a *bookmark label* to a specific point in history. +- **Bookmark association**: A data object that associates a *bookmark label* to a specific point in history. This is the fundamental operation of bookmarking. - **Point in history**: The current values of all items in history-tracked tables. It excludes non-history tracked tables, except as adjuncts to the current state. (Note that we cannot be sure what the non-history tracked tables might have contained in the past. Only the present is knowable with such tables.) - **Current point in history**: The state of the history-tracked collections at the current point in time, in whatever sense "current" can be understood. - **Past point in history**: An actual previous state of history-tracked collections. Such a state was the current point in history in the database at some moment, however briefly. -- **History tuple**: A tuple of history id's, one for each history table, in some specified order of those tables. There are many possible tuples of history id's, but only some of them represent actual points in history. Those that do represent actual points in history are called (valid) history tuples. The rest are invalid and represent nothing useful. See [[#History tuples, historical subsets, and their validity]] for how to check tuple validity. - +- **History tuple**: A tuple of history id's, one for each history table, in some specified order of those tables. There are many possible tuples of history id's, but only some of them represent actual points in history. Those that do represent actual points in history are called (valid) history tuples. The rest are invalid and represent nothing useful. See also *validity*, below. + +- **Historical subset**: A history tuple defines a subset of the history, namely all those history records in each history table that occur at or before the corresponding history id in the tuple. Under the assumption (see above) that the temporal order of history records is the same as the history id order, "at or before" means that history id is less than or equal to the history id in the tuple. See also *validity*, below. + +- **Validity** (of historical subset, of history tuple): + - A historical subset is valid if and only if it exhibits referential integrity. That is, all references by a history record in that subset to another history record must also be found in the subset. Otherwise (i.e., when there is a violation of referential integrity within the subset) the historical subset is not valid. + - A history tuple is valid if and only if it implies a valid database historical subset. + +- **Latest undeleted (LU) records**: Given a set $H$ of history records drawn from a single history table: + - Informally: The undeleted (LU) records are the latest (most recent) history records in that set, one for each item (distinguished by item id) not marked as deleted within the set. + - Formally: Define $LU(H) \subseteq H$ such that: + - For each item id $i$ present in $H$ (recall that item id's are not unique in history records): + - there is at most one history record in $h_i \in LU(H)$ with item id $i$; + - $h_i$ has the largest history id of all history records in $H$ with item id $i$; + - that item has not been deleted ($h_i$ does not have the deleted flag set). + - If $H$ contains all records in a history table prior to some point, the LU set represents what that ollection looked like at that point in time. + - Given a *valid historical subset*, then the collections of LU records from each history table in the subset give us the state of the history-tracked collections at the point in time represented by the historical subset. ## Facts and assumptions **Facts** @@ -68,27 +85,6 @@ _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ - However, that is not true if we allow bookmarking of non-latest states, which is probably going to be needed. We already have history, and we won't always anticipate future needs. Hmmm. - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These should be consistent across all history tables; verify this thinking. -## History tuples, historical subsets, and their validity - -Our goal here is to check whether a given tuple of history id's is valid, i.e., does it represent a real point in time of the history collections. This will be useful when attempting to define a bookmark association after the fact. - -**Historical subset**: A history tuple defines a subset of the history, namely all those history records in each history table that occur at or before the corresponding history id in the tuple. Under the assumption (see above) that the temporal order of history records is the same as the history id order, "at or before" means that history id is less than or equal to the history id in the tuple. - -**Validity**: A historical subset is valid if and only if it exhibits referential integrity. That is, all references by a history record in that subset to another history record must also be found in the subset. Otherwise (i.e., when there is a violation of referential integrity within the subset) the historical subset is not valid. - -A history tuple is valid if and only if it implies a valid database historical subset. - -**Algorithms for checking tuple validity**: Given this definition, algorithms for checking tuple validity are straightforward: - -1. The naive algorithm checks every reference in any given subset for presence in the referenced collection historical subset. But this is a huge number of records in most cases. -2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. - -**Uses**: - -1. Creating a (valid) bookmark association post hoc. -2. It is possible that a history tuple may be presented for checkout (see below) that is not known to be valid. -3. It is also possible that a single point of a single collection may be presented and we wish to construct a valid database historical state from it. Validity criteria tell us to do this. - ## History operations We'll need a small handful of operations related directly to history records. These form the foundation for bookmark operations. @@ -97,25 +93,10 @@ We'll need a small handful of operations related directly to history records. Th $Validate(S)$: Given a history tuple $S$, check that it is valid. Raise an error if it is not. -### Given a set of history records, get the latest undeleted (LU) records - -Suppose we have a set of history records (from a single collection). This set might be the entire history of the collection, or it might be a subset of those records, perhaps those occurring before a given history id or a given modification time. - -Given such a set, we want to extract the *latest undeleted* (LU) history records, one for each item id present in the set, and subject to the condition that there is no history record that represents a deletion of that item in the set. - -The LU set represents what the collection looked like at the point in time represented by the set. Essentially, we ignore all the earlier history, and all the records that have been deleted up to that point, retaining only the latest undeleted ones. - -Let $H$ be such a set of history records. Then $LU(H)$ is a subset of $H$ such that: -- For each item id $i$ present in $H$ (recall that item id's are not unique in history records): - - there is at most one history record in $h_i \in LU(H)$ with item id $i$; - - $h_i$ has the largest history id of all history records in $H$ with item id $i$; - - that item has not been deleted ($h_i$ does not have the deleted flag set). - -##### Relation to historical subsets - - When we have a historical subset (defined above), then the collections of LU records from each history table in the subset give us the state of the history-tracked collections at the point in time represented by the historical subset. The main table at the point in time represented by the history subset is given by $LU(H$) for each corresponding history table $H$. +1. The naive algorithm checks every reference in any given subset for presence in the referenced collection historical subset. But this is a huge number of records in most cases. +2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. -##### Implementation +### Given a set of history records, get the latest undeleted (LU) records It's straightforward to construct a query that yields the history id's of the LU items in a given collection. These history id's can then be used to extract part or all of the LU records from the history table. @@ -142,7 +123,7 @@ The `` could look like one of the following: - etc. Again looking a little forward in this document, a common case will be where the condition is related to one or more bookmarks. -##### Other notes +##### Notes - Fidelity to actual history would require that there are no gaps in the set of history records, i.e., that we haven't arbitrarily dropped some from the middle. However, that is not strictly necessary for these operations to be performed.) ## Bookmark operations @@ -216,19 +197,11 @@ QA releases and other updates are expected to arrive in large batches. If it is ##### Bracketing for regular ingestion (`crmprtd`) -Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and does scale linearly in total observations, but it is nonetheless better than bookmarking one observation at a time. +Regular ingestion (via `crmprtd` and related scripts) occurs piecemeal. Typically, dozens to thousands of observations are ingested at a time (hourly, daily, weekly, or monthly, depending on the network). We can use bracketing for each such group of observations ingested. This is still much smaller than a typical QA release, and it does scale linearly in total observations, but it is nonetheless much better than bookmarking one observation at a time. ##### Considerations -It will be useful for the bookmarks used to bracket each ingestion to bear a clear and easily queried relationship to each other. This would enable an entire raw dataset to be extracted with simpler and more error-resistant queries. This could be done with disciplined naming in plain text, but that can be error prone and hard to debug. - -- This is where extending a bookmark association record with one or more adjunct columns would be useful. -- A fully normalized representation for such extensions includes: - - A single bookmark label can be associated multiple times to (groups of) observations. - - One attribute of the bookmark association distinguishes bracket-start and bracket-end. - - Another attribute of the association distinguishes the group. Time of ingestion is a natural discriminator for this, but it may be too restricted for more general cases of "many groups labelled by a single bookmark". Possibly this should be one or more of group index (non-null, integer), aux_info (nullable, text, which can encode a date or any other relevant information). - - Regarding grouping, I've rethought this and it seems as if it is redundant to other information available (namely, observation time, ingestion mod time/history id, bookmarking mod time/history id) and also potentially very confusing. -- For more details, see [[#Implementation notes]] notes below. +Bookmarks used to bracket each ingestion should bear a clear and easily queried relationship to each other. This enables an entire dataset (e.g., raw, QA'd) to be extracted with simple, error-resistant queries. The design of bookmark associations, specifically columns `role` and `bracket_begin_id`, support this directly. ### Historical reconstruction or Rollback @@ -236,7 +209,7 @@ It will be useful for the bookmarks used to bracket each ingestion to bear a cle The design of history tracking makes it easy (although not necessarily *fast*) to reconstruct a point in history from a bookmark (or more precisely a bookmark association). In other terms, to produce the historical subset given a history tuple. -We already have the required tools in the definition of LU (latest updated) records. Rollback is just the operation of generating these records and storing them somewhere. +The definition of LU (latest updated) records provides the necessary tool. Rollback is just the operation of generating LU records and storing them somewhere. #### Implementation considerations @@ -290,23 +263,7 @@ Let `b_begin` and `b_end` represent tuples either externally provided history tu The bracketed set of updates may include multiple updates to a single item in a given collection. After the fact, we are only interested in the final outcome, which is the *latest* value for each collection item in the bracket. We must do a little extra work to obtain the latest indexes, which means taking the latest (equivalently, greatest) history id within the subset for each item id. -Here is the specific query for `meta_network_hx`. We can generalize to any history table from this example. - -``` -SELECT - obs_raw_id, - max(obs_raw_hx_id) AS max_obs_raw_hx_id -FROM - obs_raw_hx -WHERE b_begin.obs_raw_hx_id < obs_raw_hx.obs_raw_hx_id - AND obs_raw_hx.obs_raw_hx_id <= b_end.obs_raw_hx_id -GROUP BY - obs_raw_hx.obs_raw_id -HAVING - NOT bool_or(obs_raw_hx.deleted) -``` - -These history id's, `max_obs_raw_hx_id`, select the latest records in the bracketed set. +See [[#Given a set of history records, get the latest undeleted (LU) records]]. ##### Caution! Bracketed sets vs valid historical subsets @@ -337,20 +294,7 @@ We can also form a subset based on time constraints. This is not fundamentally d Let `t_begin` and `t_end` be timestamps defining the time period of interest. We want to extract the subset of records that were inserted or updated in this period. -Here is the specific query for `obs_raw_hx`, which is the most likely target for this kind of query. We can generalize to any history table from this example. - -``` -SELECT - obs_raw_id, - max(obs_raw_hx_id) AS max_obs_raw_hx_id -FROM - obs_raw_hx -WHERE t_begin < mod_time AND mod_time <= t_end -GROUP BY - obs_raw_id -HAVING - NOT bool_or(deleted) -``` +See [[#Given a set of history records, get the latest undeleted (LU) records]]. All three considerations described above about what records are germane in the subset are important here: @@ -394,69 +338,38 @@ A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. - We likely want to bracket multiple groups of observations -- e.g., those ingested by `crmprtd` at any one time -- using the same bookmark. -| Column | Type | Remarks | -| ------------------------- | ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| `role` | ?`enumeration`; ?`int` (FK to role value table); ?`text`. Values: `'single'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | -| `bracket_id` | `int` | See discussion below. | -| ~~? `group`~~ | ~~?`int`; ?`timestamp`; ?`text`~~ | ~~DEPRECATED For distinguishing multiple associations of the same bookmark.
Specific usages: `crmprtd` ingestion; possibly for regular QA releases.
Will require discipline on the part of the user in order not to make a mess, particularly if the discriminator is textual. Currently I favour type `int` paired with optional `aux_info`.
May need better name; will depend in part on type (`timestamp` vs. more general `int` or `text`).~~ | -| `comment` | `text` | Nullable.
Auxiliary information about the association. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | -TODO: Finish off rethinking below and adjust all documentation as necessary. - +| Column | Type | Remarks | +| ------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | +| `bookmark_association_id` | `int` | PK | +| `bookmark_id` | `int` | FK `bookmarks` | +| `role` | ?`enumeration`; ?`int` (FK to role value table); ?`text`. Values: `'singleton'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | +| `bracket_begin_id` | `int` | FK `bookmark_associations`. See discussion below. | +| `comment` | `text` | Nullable.
Auxiliary information about the association. | +| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | +| `meta_network_hx_id` | `int` | PK `meta_network_hx` | +| `meta_station_hx_id` | `int` | PK `meta_station_hx` | +| `meta_history_hx_id` | `int` | PK `meta_history_hx` | +| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | +| `mod_user` | `text` | | +| `mod_time` | `timestamp` | | Constraints: -- ~~unique (`bookmark_id`, `role`, `group`)~~ (see below) +- Tuple validity. +- Trigger function enforces constraint on `bracket_begin_id`. See discussion below. Questions: 1. Apply history tracking to this table? Reason, utility? Constraints on bookmark associations and role: -- We are planning to enforce uniqueness of (`bookmark_id`, `role`, `group`). This may not be sufficient to ensure sanity. -- What we really need to do is to enforce the ordering of uses of `role`. This really needs to require strict, ordered pairing of bracket-begin and bracket-end, with single only allowed outside pairs. Grammar for this would be `(single* (bracket_begin bracket_end)*)*` -- But, under what ordering of associations? We need to accommodate post-hoc associations. Duh ... under order of history id ... or equivalently mod_time. -- With that decided, it will be interesting to formulate a query that can evaluate the condition. The condition can be restated as: - - `single` can be followed by `single` or `bracket_begin` - - `bracket_begin` can be followed by `bracket_end` - - `bracket_end` can be followed by `single` or `bracket_begin` -- If a bookmark association is proposed to be added at history id `hx_id` , then find the most recent (largest history id) bookmark association with this label prior to hx_id and apply the above rules. This will have to be evaluated for all tables. - -Rethinking this: -- Does `group` have any utility at all? We have observation time, ingestion mod time/history id (in each history table associated to), bookmarking mod time/history id. -- So what if bookmarks overlap? What if we want it that way? Caveat user? -- We don't really have much knowledge at this about desirable or undesirable uses of bookmarks, and yet here we are trying to constrain their use. -- Leaving this unconstrained, and eliminating the likely useless and confusing `group` seems like the best way to go at the moment. - -But ... -- If you want to allow overlapping bookmarks (but nested still can be handled automatically, see below), then you need some way to pair bracket-begin and bracket-end. This is the role of some notion such as group. There is no other use for such an attribute. Therefore it should be named something like "bracket-id", and suitable constraints established on it (fingers crossed it's not very complex). -- Constraints: - - Bracket id is not relevant and therefore should be null, or should have a fixed value, say 0, for role single. - - There can be no more than one pair of bracket-begin bracket-end with the same bracket id. (We could write this `[id updates id]`, e.g., `[2 updates 2]` for short.) - - Bracket-begin with id $n$ is permitted if and only - - there is no other bracket-begin with id $n$ - - Bracket-end with id $n$ is permitted if and only if there is - - a bracket-begin with id $n$ - - no bracket-end with id $n$ - - We could provide a default, auto-incremented value for bracket id when the user does not set it, but: - - That invites some problems if the user is not aware of the attribute or doesn't grok how to use it. But see OTOH below for a way to minimize these problems. Essentially bracket-id is an auto-generated handle that the user should carry around. - - If there are several unpaired bracket-begins, then what bracket id to apply to the next unspecified bracket-end is ambiguous. - - Probably should make bracket-id not nullable for brackets. - - OTOH, could - - Return an auto-generated bracket-id from the create-association operation with bracket-begin and no bracket id specified. - - Auto-generate bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surpus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. - - Raise an error if more than one open bracket. -- Note that the need for bracket id falls away if we require strict nesting of brackets. Then the nesting rules tell you exactly how the bracket pairs match up. That excludes non-nested overlap, which it is conceivable may be wanted (for what??). We don't have enough info to exclude it absolutely. -- Also, while the nesting rule is easy enough to formulate (as a CFG, for example), it is more complex to enforce in code, which would require a small parser and have to allow for open brackets. So perhaps bracket id is, in addition to being more flexible, also simpler. -- Any automatic value generation for bracket id will have to be done by a trigger function. Generated columns are not permitted to reference any other row than the one being modified. -- Jesus, this whole thing can be simplified down to bracket id = bookmark association id, and whenever that is specified, it must be for a bracket-end matching to a bracket-begin. All other cases must be null. + +- Singleton bookmarks are permitted in any pattern, no constraints except tuple validity. +- We allow any pattern of bracket bookmark associations: disjoint, nested, overlapping. This is because we have no current knowledge of what patterns will be useful in future, and no logical reasons to exclude any. +- The only constraints on brackets are: + - bracket-begin and bracket-end must occur in matching pairs (open brackets, i.e., unmatched bracket-begins, are permitted). + - a bracket-end must specify an open (not yet paired) bracket-begin that occurs before (in order of ascending `bookmark_association_id`) the bracket-begin. +- We could: + - Auto-generate `bracket_begin_id` for a bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surpus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. ### Functions, stored procedures @@ -474,7 +387,7 @@ Since bookmarking is a non-trivial activity, it will be useful to encapsulate it ### Triggers -1. Enforce values of `mod_time`, `mod_user` in `bookmarks` and `bookmark_associations`. (As for history tracking; reuse tf.) +1. Enforce values of `mod_time`, `mod_user` in `bookmark_labels` and `bookmark_associations`. (As for history tracking; reuse tf.) ## Metadata support set @@ -521,43 +434,4 @@ A slightly less self-evident case of support - There may be no such metadata history record for some or all of the elements of $Sc(X)$. Therefore `St(X,T)` may not contain one item for every metadata record type. - Tag `T` can tag *any* metadata history record in an item's history set. Therefore the elements of `St(X,T)` may occur *before* the historical support items for $X$. This may or may not make sense in any given context. -It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. - -## Points in history - -***This discussion now appears irrelevant*** - -### Ordering of updates - -***Caution***: It is tempting -- and sometimes useful -- to think of a point in history in terms of the state of a single item in a collection. It must be borne in mind, however, that there are many items in any given collection and so a point in history is actually a large set, not just one item. - -History advances one change at a time. Updates to records are essentially serial. *Is that true in all contexts, e.g., in the context of a transaction?* - -Let's consider a simplified situation: An observations collection O and a metadata collection M to which observations are linked. - -A given point P in history comprises -- The states of items in O. -- The states of items in M. - -At point P in history, a read query is made to the main tables (which is the only case we are considering here; main tables are the primary interface). - -Only the states of items in O and items in M determine the result of the query at point P. How they got to those states -- their past histories -- are irrelevant to the result, except in cases where modification time is part of the query. Let's set that case aside for the moment. - -Consider a single item o in O and the item m in M that it links to. The value (state) of o and of m could have been reached by two different histories, i.e., two different orderings between the most recent two updates to o and h: - -1. Update o, then update m. -2. Update m, then update o. - -This extends to all items in O and all items in M: their joint states can be the product of many different orderings of updates. - -Upshot: Any given point in history can be reached by many different sequences of updates. - -### Subsequent points in history - -OK, so what? - -Let $P$, $Q$, and $R$ be a time-ordered sequence of points in history. That is, $P$ precedes $Q$ precedes $R$ in temporal order. Let the updates from $P \to Q$ be denoted $U_Q$, and the updates from $Q \to R$ be denoted $U_R$. - -Then at point $R$, any interleaving of updates in $U_Q$ and $U_R$ are equivalent (neglecting mod time), so long as the ordering of updates to any single item is preserved. In short, it doesn't matter how we got to $R$, so long as we don't reorder updates to the same item. - -This is relevant when considering what points to bookmark, and what bookmarks to choose as a rollback point. \ No newline at end of file +It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. \ No newline at end of file diff --git a/pycds/orm/tables.py b/pycds/orm/tables.py index b77f1b25..d36e915d 100644 --- a/pycds/orm/tables.py +++ b/pycds/orm/tables.py @@ -652,7 +652,7 @@ class BookmarkAssociationRole(Enum): https://stackoverflow.com/a/73922844 """ - single = "single" + singleton = "singleton" bracket_begin = "bracket_begin" bracket_end = "bracket_end" From 60f74e6431362bd0775b6b27aa0d1a9df7eda1ef Mon Sep 17 00:00:00 2001 From: rod-glover Date: Mon, 8 Sep 2025 09:54:38 -0700 Subject: [PATCH 08/13] WIP --- docs/bookmarks.md | 7 +- pycds/orm/functions/bookmarking.py | 46 +++++ pycds/orm/functions/bookmarking.sql | 297 ++++++++++++++++++++++++++++ 3 files changed, 348 insertions(+), 2 deletions(-) create mode 100644 pycds/orm/functions/bookmarking.py create mode 100644 pycds/orm/functions/bookmarking.sql diff --git a/docs/bookmarks.md b/docs/bookmarks.md index ccedce51..5abce085 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -96,7 +96,7 @@ $Validate(S)$: Given a history tuple $S$, check that it is valid. Raise an error 1. The naive algorithm checks every reference in any given subset for presence in the referenced collection historical subset. But this is a huge number of records in most cases. 2. A less naive and much faster algorithm relies on the assumption that every actually occurring historical state of the database was valid (quite a reasonable assumption!), and therefore that history tables reflect that. This assumption allows us to check only that reference history id's are less than or equal to the corresponding collection history id in the tuple. -### Given a set of history records, get the latest undeleted (LU) records +### Get the latest undeleted (LU) records, given a subset of history records It's straightforward to construct a query that yields the history id's of the LU items in a given collection. These history id's can then be used to extract part or all of the LU records from the history table. @@ -162,6 +162,9 @@ History records between $L_1$ (exclusive) and $L_2$ (inclusive) are exactly and - their isolation in the transaction (so no other update operations are interleaved); - the fact that change records are appended to the history tables in temporal order of change operations. +**Notes**: +- The above definition is abstract. It's not + **For further discussion and analysis**: - Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark label $L$, and allow the system to construct bookmarks $L_1$ and $L_2$ from $L$. In fact, we use the same label, and the bookmark association carries the distinction between $L_1$ and $L_2$. We can then define $Bracket(L, U) = Bracket(L_1, U, L_2)$, where $L_1$ and $L_2$ are the constructed bookmarks. See use of auxiliary columns in bookmark association table in [[#Implementation notes]] below. @@ -369,7 +372,7 @@ Constraints on bookmark associations and role: - bracket-begin and bracket-end must occur in matching pairs (open brackets, i.e., unmatched bracket-begins, are permitted). - a bracket-end must specify an open (not yet paired) bracket-begin that occurs before (in order of ascending `bookmark_association_id`) the bracket-begin. - We could: - - Auto-generate `bracket_begin_id` for a bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surpus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. + - Auto-generate `bracket_begin_id` for a bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surplus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. ### Functions, stored procedures diff --git a/pycds/orm/functions/bookmarking.py b/pycds/orm/functions/bookmarking.py new file mode 100644 index 00000000..e5413f73 --- /dev/null +++ b/pycds/orm/functions/bookmarking.py @@ -0,0 +1,46 @@ +"""Functions, stored procedures, and trigger functions supporting bookmark operations. + +TODO: Rename to Alembic version. + +TODO: Define ReplaceableProcedure and supporting SQLA components, parallel to + ReplaceableFunction to create stored procedures. + See https://www.postgresql.org/docs/current/sql-createprocedure.html + Maybe a separate branch/PR and migration preceding this one. Yak shaving. + For now, use ReplaceableFunction ... DO NOT FORGET THIS. + +TODO: + Functions. Possibilities + - get LU + - validate history tuple + - ? create bookmark label + - create bookmark association + - create bookmark association now + - bracket updates + - create rollback + Trigger functions: + - +""" + +from pycds.alembic.extensions.replaceable_objects import ReplaceableFunction +from pycds.context import get_schema_name + + +schema_name = get_schema_name() + + +# Get LU history ids. +# This follows the pattern of other hxtk_ utility functions (receives collection name, +# etc.). +# Arguments: +# collection name +# where condition +# +# Returns table of history id's for LU set from collection history satisfying condition. +hxtk_get_latest_undeleted_hx_ids = ReplaceableFunction( + +) + + +# hxtk_create_bookmark_label = # Necessary? This is just an insert to the table. + + diff --git a/pycds/orm/functions/bookmarking.sql b/pycds/orm/functions/bookmarking.sql new file mode 100644 index 00000000..8740d3b6 --- /dev/null +++ b/pycds/orm/functions/bookmarking.sql @@ -0,0 +1,297 @@ +CREATE OR REPLACE FUNCTION hxtk_make_query_latest_undeleted_hx_ids( + collection_name text, + collection_id_name text, + where_condition text = 'true' +) + RETURNS SETOF RECORD + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This function returns text containing a query that returns the history id's for the LU records, + -- given a collection spec and a where condition. +DECLARE + hx_table_name text := hxtk_hx_table_name(collection_name); + hx_id_name text := hxtk_hx_id_name(collection_name); + q text := format( + 'SELECT hx.%1$I, max(hx.%2$I) ' || + 'FROM %3$I hx ' || + 'WHERE %4$I ' || + 'GROUP BY hx.%1$I ' || + 'HAVING NOT bool_or(hx.deleted) ', + collection_id_name, + hx_id_name, + hx_table_name, + where_condition + ); +BEGIN + RAISE NOTICE '%', q; + RETURN q; +END; +$BODY$; + + +CREATE OR REPLACE FUNCTION hxtk_get_latest_undeleted_hx_ids( + collection_name text, + collection_id_name text, + where_condition text = 'true' +) + RETURNS SETOF RECORD + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This function returns the history id's for the LU records, given a collection and a where condition. +BEGIN + RETURN QUERY EXECUTE hxtk_make_query_latest_undeleted_hx_ids( + collection_name, collection_id_name, where_condition); +END; +$BODY$; + + +CREATE OR REPLACE FUNCTION hxtk_get_latest_undeleted_hx_records( + collection_name text, + collection_id_name text, + where_condition text = 'true' +) + RETURNS SETOF RECORD + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This function returns the full history records for the LU records, given a collection and a where condition. +DECLARE + hx_table_name text := hxtk_hx_table_name(collection_name); + hx_id_name text := hxtk_hx_id_name(collection_name); + hx_ids_query text := hxtk_make_query_latest_undeleted_hx_ids( + collection_name, collection_id_name, where_condition); +BEGIN + RETURN QUERY EXECUTE format('SELECT * FROM %I WHERE %I IN (%s)', hx_table_name, hx_id_name, hx_ids_query); +END; +$BODY$; + + +CREATE OR REPLACE FUNCTION hxtk_is_valid_history_tuple(hx_tuple history_tuple) + RETURNS boolean + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This function returns true if and only if the provided history tuple implies a valid + -- history subset. See documentation for definitions of validity. +BEGIN + -- If this tuple is the current point in history, we know it is valid. + IF + hx_tuple.obs_raw_hx_id = (SELECT max(obs_raw_hx.obs_raw_hx_id) FROM obs_raw_hx) AND + hx_tuple.meta_history_hx_id = (SELECT max(meta_history_hx.meta_history_hx_id) FROM meta_history_hx) AND + hx_tuple.meta_station_hx_id = (SELECT max(meta_station_hx.meta_station_hx_id) FROM meta_station_hx) AND + hx_tuple.meta_network_hx_id = (SELECT max(meta_network_hx.meta_network_hx_id) FROM meta_network_hx) AND + hx_tuple.meta_vars_hx_id = (SELECT max(meta_vars_hx.meta_vars_hx_id) FROM meta_vars_hx) + THEN + RETURN TRUE; + END IF; + + -- This query ANDs together sub-queries for each table. + -- Warning: The query against `obs_raw_hx` could take a long time. + RETURN QUERY SELECT + -- It's tempting to DRY up the sub-queries below into a generic make-query function, + -- but it's probably more work than it's worth. Advantage would be getting the pattern + -- right just once, and easy extension to additional cases. + (SELECT + bool_and( + obs_raw_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id AND + obs_raw_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id + ) + FROM + obs_raw_hx + WHERE + obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id) + AND (SELECT + bool_and(meta_history_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) + FROM + meta_history_hx + WHERE + meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id) + AND (SELECT + bool_and(meta_station_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) + FROM + meta_station_hx + WHERE + meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) + AND (SELECT + bool_and(meta_vars_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) + FROM + meta_vars_hx + WHERE + meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id); + + -- The above query can be reformulated as follows. This may be much more efficient if the correct indexes exist + -- and are selected by the query planner. + -- TODO: Determine which query is best. +-- RETURN QUERY SELECT +-- (SELECT +-- max(obs_raw_hx.meta_history_hx_id) <= hx_tuple.meta_history_hx_id AND +-- max(obs_raw_hx.meta_vars_hx_id) <= hx_tuple.meta_vars_hx_id +-- FROM +-- obs_raw_hx +-- WHERE +-- obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id) +-- AND +-- (SELECT +-- max(meta_history_hx.meta_station_hx_id) <= hx_tuple.meta_station_hx_id +-- FROM +-- meta_history_hx +-- WHERE +-- meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id) +-- AND +-- (SELECT +-- max(meta_station_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id +-- FROM +-- meta_station_hx +-- WHERE +-- meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) +-- AND +-- (SELECT +-- max(meta_vars_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id +-- FROM +-- meta_vars_hx +-- WHERE +-- meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id); +END; +$BODY$; + + +-- Can this and should this be used as a column in table bookmark_associations? +CREATE TYPE history_tuple AS ( + obs_raw_hx_id bigint, + meta_history_hx_id int, + meta_station_hx_id int, + meta_network_hx_id int, + meta_vars_hx_id int +); + + +CREATE TABLE IF NOT EXISTS bookmark_labels ( + bookmark_label_id int PRIMARY KEY , + network_id int REFERENCES meta_network(network_id), + label text NOT NULL, + comment text, + mod_time timestamp NOT NULL DEFAULT now(), + mod_user text NOT NULL DEFAULT current_user +); + + +CREATE TABLE IF NOT EXISTS bookmark_associations ( + bookmark_association_id int PRIMARY KEY, + bookmark_label_id int REFERENCES bookmark_labels(bookmark_label_id), + role text NOT NULL , -- Should be enum type + bracket_begin_id int REFERENCES bookmark_associations(bookmark_association_id), + comment text, + hx_tuple history_tuple, + mod_time timestamp NOT NULL DEFAULT now(), + mod_user text NOT NULL DEFAULT current_user +); + + +-- Example usages + +DO $$ + BEGIN ; + INSERT INTO + COMMIT ; +$$; + + +-- Trigger functions + +CREATE OR REPLACE FUNCTION hxtk_validate_history_tuple() + RETURNS trigger + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This trigger function validates the history tuple provided to table bookmark_associations. + -- + -- Usage: + -- CREATE TRIGGER t100_validate_history_tuple + -- BEFORE INSERT OR UPDATE + -- ON bookmark_associations + -- EXECUTE PROCEDURE hxtk_validate_history_tuple() +DECLARE +BEGIN + IF NOT hxtk_is_valid_history_tuple(NEW.hx_tuple) THEN + RAISE 'Invalid history tuple'; + END IF; +END; +$BODY$; + + +CREATE OR REPLACE FUNCTION hxtk_check_bracket_end() + RETURNS trigger + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This trigger function checks that a bracket-end insert to table references an open + -- bracket (bracket_begin_id), or if it references no bracket and there is only one + -- open one, it provides that value. + -- + -- Usage: + -- CREATE TRIGGER t200_check_bracket_end + -- BEFORE INSERT OR UPDATE + -- ON bookmark_associations + -- EXECUTE PROCEDURE hxtk_check_bracket_end() +DECLARE + max_ba_id int; + ba_id_count int; +BEGIN + IF NEW.bracket_begin_id IS NULL THEN + -- Check if we have one open bracket. Use it if so. + SELECT + max(bookmark_association_id), + count(*) + INTO STRICT max_ba_id, ba_id_count + FROM + bookmark_associations + WHERE + role = 'bracket_begin'; + IF ba_id_count = 1 THEN + NEW.bracket_begin_id := max_ba_id; + ELSE + RAISE EXCEPTION 'bracket_begin_id is null, and there are % open brackets.', ba_id_count; + END IF; + ELSE + -- Check whether this bracket is open. Error if it is not. + SELECT + count(*) + INTO ba_id_count + FROM + bookmark_associations + WHERE + role = 'bracket_end' + AND bracket_end_id = NEW.bracket_begin_id; + IF ba_id_count > 0 THEN + RAISE EXCEPTION 'The bracket with id % is already closed.'; + END IF; + END IF; +END; +$BODY$; + + +CREATE OR REPLACE FUNCTION hxtk_() + RETURNS text -- FIXME + LANGUAGE 'plpgsql' +AS +$BODY$ +DECLARE +BEGIN +END; +$BODY$; + + +---- Template +CREATE OR REPLACE FUNCTION hxtk_() + RETURNS text -- FIXME + LANGUAGE 'plpgsql' +AS +$BODY$ +DECLARE +BEGIN +END; +$BODY$; + + From 296b9669d6fb5daa6dcbe733011c6df38fa05dd6 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Mon, 8 Sep 2025 20:59:09 -0700 Subject: [PATCH 09/13] Add essential fns and demos of them --- pycds/orm/functions/bookmarking.sql | 514 ++++++++++++++++++++-------- 1 file changed, 377 insertions(+), 137 deletions(-) diff --git a/pycds/orm/functions/bookmarking.sql b/pycds/orm/functions/bookmarking.sql index 8740d3b6..757c3782 100644 --- a/pycds/orm/functions/bookmarking.sql +++ b/pycds/orm/functions/bookmarking.sql @@ -1,9 +1,10 @@ +----------------- CREATE OR REPLACE FUNCTION hxtk_make_query_latest_undeleted_hx_ids( collection_name text, collection_id_name text, - where_condition text = 'true' + where_condition text = 'TRUE' ) - RETURNS SETOF RECORD + RETURNS text LANGUAGE 'plpgsql' AS $BODY$ @@ -13,9 +14,9 @@ DECLARE hx_table_name text := hxtk_hx_table_name(collection_name); hx_id_name text := hxtk_hx_id_name(collection_name); q text := format( - 'SELECT hx.%1$I, max(hx.%2$I) ' || + 'SELECT hx.%1$I AS %1$I, max(hx.%2$I) AS %2$I ' || 'FROM %3$I hx ' || - 'WHERE %4$I ' || + 'WHERE %4$s ' || 'GROUP BY hx.%1$I ' || 'HAVING NOT bool_or(hx.deleted) ', collection_id_name, @@ -24,12 +25,23 @@ DECLARE where_condition ); BEGIN - RAISE NOTICE '%', q; + -- RAISE NOTICE '%', q; RETURN q; END; $BODY$; +DO LANGUAGE 'plpgsql' +$$ + BEGIN + RAISE NOTICE 'hxtk_make_query_latest_undeleted_hx_ids: %', + hxtk_make_query_latest_undeleted_hx_ids('meta_network', 'network_id'); + END; +$$; + + + +----------------- CREATE OR REPLACE FUNCTION hxtk_get_latest_undeleted_hx_ids( collection_name text, collection_id_name text, @@ -47,6 +59,27 @@ END; $BODY$; +DO LANGUAGE 'plpgsql' +$$ + DECLARE + r record; + BEGIN + RAISE NOTICE '(network_id, meta_network_hx_id)'; + FOR r IN + SELECT * + FROM + hxtk_get_latest_undeleted_hx_ids('meta_network', 'network_id') + AS t(network_id int, meta_network_hx_id int) + ORDER BY network_id + LOOP + RAISE NOTICE '%', r; + END LOOP; + END ; +$$; + + + +----------------- CREATE OR REPLACE FUNCTION hxtk_get_latest_undeleted_hx_records( collection_name text, collection_id_name text, @@ -63,11 +96,91 @@ DECLARE hx_ids_query text := hxtk_make_query_latest_undeleted_hx_ids( collection_name, collection_id_name, where_condition); BEGIN - RETURN QUERY EXECUTE format('SELECT * FROM %I WHERE %I IN (%s)', hx_table_name, hx_id_name, hx_ids_query); + RETURN QUERY EXECUTE + format( + 'SELECT * FROM %1$I WHERE %2$I IN (SELECT t.%2$I FROM (%3$s) AS t)', + hx_table_name, hx_id_name, hx_ids_query + ); +END; +$BODY$; + + +DO LANGUAGE 'plpgsql' +$$ + DECLARE + r record; + BEGIN + RAISE NOTICE 'LU(meta_network_hx)'; + RAISE NOTICE '(meta_network_hx_id, network_id, network_name)'; + FOR r IN + SELECT + meta_network_hx_id, + network_id, + network_name + FROM + hxtk_get_latest_undeleted_hx_records('meta_network', 'network_id') + AS t(network_id integer, + network_name varchar(255), + description varchar(255), + virtual varchar(255), + publish boolean, + col_hex varchar(7), + contact_id integer, + mod_time timestamp, + mod_user varchar(64), + deleted boolean, + meta_network_hx_id integer) + LOOP + RAISE NOTICE '%', r; + END LOOP; + END ; +$$; + + + +----------------- +-- Can this and should this be used as a column in table bookmark_associations? +CREATE TYPE history_tuple AS ( + obs_raw_hx_id bigint, + meta_history_hx_id int, + meta_station_hx_id int, + meta_network_hx_id int, + meta_vars_hx_id int +); + + +CREATE OR REPLACE FUNCTION hxtk_current_hx_tuple() + RETURNS history_tuple + LANGUAGE 'plpgsql' +AS +$BODY$ + -- This function returns the current history tuple. +DECLARE + result history_tuple; +BEGIN + SELECT + (SELECT max(obs_raw_hx.obs_raw_hx_id) FROM obs_raw_hx) AS obs_raw_hx_id, + (SELECT max(meta_history_hx.meta_history_hx_id) FROM meta_history_hx) AS meta_history_hx_id, + (SELECT max(meta_station_hx.meta_station_hx_id) FROM meta_station_hx) AS meta_station_hx_id, + (SELECT max(meta_network_hx.meta_network_hx_id) FROM meta_network_hx) AS meta_network_hx_id, + (SELECT max(meta_vars_hx.meta_vars_hx_id) FROM meta_vars_hx) AS meta_vars_hx_id + INTO STRICT result; + RETURN result; END; $BODY$; +DO LANGUAGE 'plpgsql' +$$ + BEGIN + RAISE NOTICE '(obs_raw_hx_id, meta_history_hx_id, meta_station_hx_id int, meta_network_hx_id int, meta_vars_hx_id int)'; + RAISE NOTICE '%', hxtk_current_hx_tuple(); + END ; +$$; + + + +----------------- CREATE OR REPLACE FUNCTION hxtk_is_valid_history_tuple(hx_tuple history_tuple) RETURNS boolean LANGUAGE 'plpgsql' @@ -75,101 +188,133 @@ AS $BODY$ -- This function returns true if and only if the provided history tuple implies a valid -- history subset. See documentation for definitions of validity. +DECLARE + obs_raw_ok boolean; + meta_history_ok boolean; + meta_station_ok boolean; + meta_network_ok boolean; + meta_vars_ok boolean; + t_start timestamp; + t_end timestamp; BEGIN + RAISE NOTICE 'hxtk_is_valid_history_tuple %', hx_tuple; + + IF hx_tuple IS NULL THEN + RETURN FALSE; + END IF; + -- If this tuple is the current point in history, we know it is valid. IF - hx_tuple.obs_raw_hx_id = (SELECT max(obs_raw_hx.obs_raw_hx_id) FROM obs_raw_hx) AND - hx_tuple.meta_history_hx_id = (SELECT max(meta_history_hx.meta_history_hx_id) FROM meta_history_hx) AND - hx_tuple.meta_station_hx_id = (SELECT max(meta_station_hx.meta_station_hx_id) FROM meta_station_hx) AND - hx_tuple.meta_network_hx_id = (SELECT max(meta_network_hx.meta_network_hx_id) FROM meta_network_hx) AND - hx_tuple.meta_vars_hx_id = (SELECT max(meta_vars_hx.meta_vars_hx_id) FROM meta_vars_hx) + hx_tuple = hxtk_current_hx_tuple() THEN + RAISE NOTICE '= current hx tuple'; RETURN TRUE; END IF; - -- This query ANDs together sub-queries for each table. -- Warning: The query against `obs_raw_hx` could take a long time. - RETURN QUERY SELECT - -- It's tempting to DRY up the sub-queries below into a generic make-query function, - -- but it's probably more work than it's worth. Advantage would be getting the pattern - -- right just once, and easy extension to additional cases. - (SELECT - bool_and( - obs_raw_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id AND - obs_raw_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id - ) - FROM - obs_raw_hx - WHERE - obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id) - AND (SELECT - bool_and(meta_history_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) - FROM - meta_history_hx - WHERE - meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id) - AND (SELECT - bool_and(meta_station_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) - FROM - meta_station_hx - WHERE - meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) - AND (SELECT - bool_and(meta_vars_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) - FROM - meta_vars_hx - WHERE - meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id); - - -- The above query can be reformulated as follows. This may be much more efficient if the correct indexes exist - -- and are selected by the query planner. + + -- This query is likely more efficient if the correct indexes exist and are selected by the query planner. -- TODO: Determine which query is best. + + RAISE NOTICE 'Checking in detail'; + t_start := clock_timestamp(); + RAISE NOTICE 'Starting obs_raw_hx at %', t_start; + SELECT + max(obs_raw_hx.meta_history_hx_id) <= hx_tuple.meta_history_hx_id AND + max(obs_raw_hx.meta_vars_hx_id) <= hx_tuple.meta_vars_hx_id + FROM + obs_raw_hx + WHERE + obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id + INTO STRICT obs_raw_ok; + t_end := clock_timestamp(); + RAISE NOTICE 'Finished obs_raw_hx at %; time elapsed %', t_end, t_end - t_start; + + SELECT + max(meta_history_hx.meta_station_hx_id) <= hx_tuple.meta_station_hx_id + FROM + meta_history_hx + WHERE + meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id + INTO STRICT meta_history_ok; + + SELECT + max(meta_station_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id + FROM + meta_station_hx + WHERE + meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id + INTO STRICT meta_station_ok; + + SELECT + max(meta_vars_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id + FROM + meta_vars_hx + WHERE + meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id + INTO STRICT meta_vars_ok; + + RETURN obs_raw_ok AND meta_history_ok AND meta_station_ok AND meta_network_ok AND meta_vars_ok; + + -- This is probably a slower query. If used, reformulate as a series of separate queries like above. -- RETURN QUERY SELECT --- (SELECT --- max(obs_raw_hx.meta_history_hx_id) <= hx_tuple.meta_history_hx_id AND --- max(obs_raw_hx.meta_vars_hx_id) <= hx_tuple.meta_vars_hx_id --- FROM --- obs_raw_hx --- WHERE --- obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id) --- AND --- (SELECT --- max(meta_history_hx.meta_station_hx_id) <= hx_tuple.meta_station_hx_id --- FROM --- meta_history_hx --- WHERE --- meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id) --- AND --- (SELECT --- max(meta_station_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id --- FROM --- meta_station_hx --- WHERE --- meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) --- AND --- (SELECT --- max(meta_vars_hx.meta_network_hx_id) <= hx_tuple.meta_network_hx_id --- FROM --- meta_vars_hx --- WHERE --- meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id); +-- -- It's tempting to DRY up the sub-queries below into a generic make-query function, +-- -- but it's probably more work than it's worth. Advantage would be getting the pattern +-- -- right just once, and easy extension to additional cases. +-- (SELECT +-- bool_and( +-- obs_raw_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id AND +-- obs_raw_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id +-- ) +-- FROM +-- obs_raw_hx +-- WHERE +-- obs_raw_hx.obs_raw_hx_id <= hx_tuple.obs_raw_hx_id) +-- AND (SELECT +-- bool_and(meta_history_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) +-- FROM +-- meta_history_hx +-- WHERE +-- meta_history_hx.meta_history_hx_id <= hx_tuple.meta_history_hx_id) +-- AND (SELECT +-- bool_and(meta_station_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) +-- FROM +-- meta_station_hx +-- WHERE +-- meta_station_hx.meta_station_hx_id <= hx_tuple.meta_station_hx_id) +-- AND (SELECT +-- bool_and(meta_vars_hx.meta_network_hx_id <= hx_tuple.meta_network_hx_id) +-- FROM +-- meta_vars_hx +-- WHERE +-- meta_vars_hx.meta_vars_hx_id <= hx_tuple.meta_vars_hx_id); + END; $BODY$; --- Can this and should this be used as a column in table bookmark_associations? -CREATE TYPE history_tuple AS ( - obs_raw_hx_id bigint, - meta_history_hx_id int, - meta_station_hx_id int, - meta_network_hx_id int, - meta_vars_hx_id int -); +DO LANGUAGE 'plpgsql' +$$ + DECLARE + curr_hx_tuple history_tuple := hxtk_current_hx_tuple(); + bad_hx_tuple history_tuple := + (curr_hx_tuple.obs_raw_hx_id, + curr_hx_tuple.meta_history_hx_id - 1, + curr_hx_tuple.meta_station_hx_id - 1, + curr_hx_tuple.meta_network_hx_id - 1, + curr_hx_tuple.meta_vars_hx_id - 1 + )::history_tuple; + BEGIN + RAISE NOTICE 'current: %', hxtk_is_valid_history_tuple(curr_hx_tuple); + RAISE NOTICE 'bad: %', hxtk_is_valid_history_tuple(bad_hx_tuple); + END ; +$$; +----------------- CREATE TABLE IF NOT EXISTS bookmark_labels ( - bookmark_label_id int PRIMARY KEY , - network_id int REFERENCES meta_network(network_id), + bookmark_label_id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, + network_id int REFERENCES meta_network (network_id), label text NOT NULL, comment text, mod_time timestamp NOT NULL DEFAULT now(), @@ -178,26 +323,17 @@ CREATE TABLE IF NOT EXISTS bookmark_labels ( CREATE TABLE IF NOT EXISTS bookmark_associations ( - bookmark_association_id int PRIMARY KEY, - bookmark_label_id int REFERENCES bookmark_labels(bookmark_label_id), - role text NOT NULL , -- Should be enum type - bracket_begin_id int REFERENCES bookmark_associations(bookmark_association_id), + bookmark_association_id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, + bookmark_label_id int REFERENCES bookmark_labels (bookmark_label_id), + role text NOT NULL, -- Should be enum type + bracket_begin_id int REFERENCES bookmark_associations (bookmark_association_id), comment text, - hx_tuple history_tuple, + hx_tuple history_tuple NOT NULL DEFAULT hxtk_current_hx_tuple(), mod_time timestamp NOT NULL DEFAULT now(), mod_user text NOT NULL DEFAULT current_user ); --- Example usages - -DO $$ - BEGIN ; - INSERT INTO - COMMIT ; -$$; - - -- Trigger functions CREATE OR REPLACE FUNCTION hxtk_validate_history_tuple() @@ -214,14 +350,16 @@ $BODY$ -- EXECUTE PROCEDURE hxtk_validate_history_tuple() DECLARE BEGIN + RAISE NOTICE 'hxtk_validate_history_tuple: NEW = %', NEW; IF NOT hxtk_is_valid_history_tuple(NEW.hx_tuple) THEN RAISE 'Invalid history tuple'; END IF; + RETURN NEW; END; $BODY$; -CREATE OR REPLACE FUNCTION hxtk_check_bracket_end() +CREATE OR REPLACE FUNCTION hxtk_bm_check_bracket_end() RETURNS trigger LANGUAGE 'plpgsql' AS @@ -234,64 +372,166 @@ $BODY$ -- CREATE TRIGGER t200_check_bracket_end -- BEFORE INSERT OR UPDATE -- ON bookmark_associations - -- EXECUTE PROCEDURE hxtk_check_bracket_end() + -- EXECUTE PROCEDURE hxtk_check_bm_bracket_end() DECLARE + bracket_begin_q text := + format( + 'SELECT count(*), max(bookmark_association_id) FROM %I.%I ' || + 'WHERE role = ''bracket_begin''', + TG_TABLE_SCHEMA, TG_TABLE_NAME + ); + bracket_end_q text := + format( + 'SELECT count(*) FROM %I.%I ' || + 'WHERE role = ''bracket_end'' ' || + ' AND bracket_end_id = NEW.bracket_begin_id', + TG_TABLE_SCHEMA, TG_TABLE_NAME + ); max_ba_id int; ba_id_count int; BEGIN IF NEW.bracket_begin_id IS NULL THEN -- Check if we have one open bracket. Use it if so. - SELECT - max(bookmark_association_id), - count(*) - INTO STRICT max_ba_id, ba_id_count - FROM - bookmark_associations - WHERE - role = 'bracket_begin'; + EXECUTE bracket_begin_q INTO STRICT ba_id_count, max_ba_id; IF ba_id_count = 1 THEN NEW.bracket_begin_id := max_ba_id; ELSE - RAISE EXCEPTION 'bracket_begin_id is null, and there are % open brackets.', ba_id_count; + RAISE EXCEPTION 'bracket_begin_id is unspecified, and there are % > 1 open brackets.', ba_id_count; END IF; ELSE -- Check whether this bracket is open. Error if it is not. - SELECT - count(*) - INTO ba_id_count - FROM - bookmark_associations - WHERE - role = 'bracket_end' - AND bracket_end_id = NEW.bracket_begin_id; + EXECUTE bracket_end_q INTO ba_id_count; IF ba_id_count > 0 THEN - RAISE EXCEPTION 'The bracket with id % is already closed.'; + RAISE EXCEPTION 'The bracket with id % is already closed.', NEW.bracket_begin_id; END IF; END IF; + RETURN NEW; END; $BODY$; -CREATE OR REPLACE FUNCTION hxtk_() - RETURNS text -- FIXME - LANGUAGE 'plpgsql' -AS -$BODY$ -DECLARE -BEGIN -END; -$BODY$; +DROP TRIGGER IF EXISTS t100_validate_history_tuple ON bookmark_associations; +CREATE TRIGGER t100_validate_history_tuple + BEFORE INSERT OR UPDATE + ON bookmark_associations + FOR EACH ROW +EXECUTE PROCEDURE hxtk_validate_history_tuple(); ----- Template -CREATE OR REPLACE FUNCTION hxtk_() - RETURNS text -- FIXME - LANGUAGE 'plpgsql' -AS -$BODY$ -DECLARE -BEGIN -END; -$BODY$; +DROP TRIGGER IF EXISTS t200_check_bracket_end ON bookmark_associations; +CREATE TRIGGER t200_check_bracket_end + BEFORE INSERT OR UPDATE + ON bookmark_associations + FOR EACH ROW + WHEN ( NEW.role = 'bracket_end' ) +EXECUTE PROCEDURE hxtk_bm_check_bracket_end(); + + +----------------- +BEGIN; -- begin transaction for tests; roll back at end +DO LANGUAGE 'plpgsql' +$$ + DECLARE + bookmark_alpha int; + nw_id int; + open_br_id int; + br_hx_tuple history_tuple; + close_br_begin_id int; + r record; + test_id int; + BEGIN + -- Create a bookmark label + INSERT INTO bookmark_labels(network_id, label) + VALUES (34, 'Alpha') + RETURNING bookmark_label_id + INTO STRICT bookmark_alpha; + RAISE NOTICE 'bookmark_labels'; + FOR r IN SELECT * FROM bookmark_labels + LOOP + RAISE NOTICE '%', r; + END LOOP; + -- Create a singleton bookmark at current history point. + RAISE NOTICE 'Create singleton bookmark'; + INSERT INTO bookmark_associations(bookmark_label_id, role) + VALUES (bookmark_alpha, 'singleton'); + + RAISE NOTICE 'bookmark_associations'; + FOR r IN SELECT * FROM bookmark_associations + LOOP + RAISE NOTICE '%', r; + END LOOP; + + -- TEST 1: Bracket with manually provided values. + test_id = 1; + RAISE NOTICE 'TEST % - BEGIN', test_id; + + -- Open a bracket bookmark at current history point. + RAISE NOTICE 'TEST % - Open bracket', test_id; + INSERT INTO bookmark_associations(bookmark_label_id, role, hx_tuple) + VALUES (bookmark_alpha, 'bookmark_begin', hxtk_current_hx_tuple()) + RETURNING bookmark_association_id + INTO STRICT open_br_id; + + -- Insert some gunk. + RAISE NOTICE 'TEST % - Add new network', test_id; + INSERT INTO meta_network(network_name) + VALUES ('Rod Test 1') + RETURNING network_id INTO STRICT nw_id; + + -- Close the open bookmark at current history point. + RAISE NOTICE 'TEST % - Close bracket', test_id; + INSERT INTO bookmark_associations(bookmark_label_id, role, bracket_begin_id) + VALUES (bookmark_alpha, 'bookmark_end', open_br_id) + RETURNING hx_tuple + INTO STRICT r; + RAISE NOTICE 'Close returns: %', r; + IF r.hx_tuple != hxtk_current_hx_tuple() THEN + RAISE 'Close bracket: Hx tuple % is not current', br_hx_tuple; + END IF; + + RAISE NOTICE 'TEST % - END', test_id; + -- END TEST 1 + + -- TEST 2: Bracket with auto-provided values. + test_id := 2; + RAISE NOTICE 'TEST % - BEGIN', test_id; + + -- Open a bracket bookmark at current history point. + RAISE NOTICE 'TEST % - Open bracket', test_id; + INSERT INTO bookmark_associations(bookmark_label_id, role) + VALUES (bookmark_alpha, 'bookmark_begin') + RETURNING bookmark_association_id, hx_tuple + INTO STRICT r; + open_br_id := r.bookmark_association_id; + IF r.hx_tuple != hxtk_current_hx_tuple() THEN + RAISE 'Open bracket: Hx tuple % is not current', br_hx_tuple; + END IF; + + -- Insert some gunk. + RAISE NOTICE 'TEST % - Add new network', test_id; + INSERT INTO meta_network(network_name) + VALUES ('Rod Test 1') + RETURNING network_id INTO STRICT nw_id; + + -- Close the open bookmark at current history point. + -- Let the tf supply the bookmark-begin id. + RAISE NOTICE 'TEST % - Close bracket', test_id; + INSERT INTO bookmark_associations(bookmark_label_id, role) + VALUES (bookmark_alpha, 'bookmark_end') + RETURNING bracket_begin_id, hx_tuple + INTO STRICT r; + IF r.hx_tuple != hxtk_current_hx_tuple() THEN + RAISE 'Close bracket: Hx tuple % is not current', br_hx_tuple; + END IF; + IF r.bracket_begin_id != open_br_id THEN + RAISE 'Close bracket: Expected bracket_begin_id = %, got %', open_br_id, close_br_begin_id; + END IF; + + RAISE NOTICE 'TEST % - END', test_id; + -- END TEST 2 + + END; +$$; +ROLLBACK; From 549c5262e90e4270370c1cb4a3757db9efad2ab3 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Tue, 9 Sep 2025 16:14:39 -0700 Subject: [PATCH 10/13] Refine slightly --- docs/bookmarks.md | 46 ++++++++++++++--------------- pycds/orm/functions/bookmarking.py | 46 ----------------------------- pycds/orm/functions/bookmarking.sql | 2 ++ 3 files changed, 25 insertions(+), 69 deletions(-) delete mode 100644 pycds/orm/functions/bookmarking.py diff --git a/docs/bookmarks.md b/docs/bookmarks.md index 5abce085..acc0a3bd 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -33,6 +33,29 @@ _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ +## Facts and assumptions + +**Facts** + +- History tables are append-only. +- Each history table records the changes made to the entire collection *in temporal order of the changes*. +- Each successive update to a collection is recorded by appending a record to its history table; therefore temporal order is also the order by ascending history id. + +**Assumptions** + +- No existing record in a history table is ever modified. + +**Therefore** + +- If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark association can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. +- Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). + +**For further consideration** + +- Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. + - However, that is not true if we allow bookmarking of non-latest states, which is probably going to be needed. We already have history, and we won't always anticipate future needs. Hmmm. + - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These should be consistent across all history tables; verify this thinking. + ## Terminology - **Bookmark**: A named object that designates a *point in history*. This is an imprecise usage of the term "bookmark", which is actually two related things, a *bookmark label* and a *bookmark association*: @@ -62,29 +85,6 @@ _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ - that item has not been deleted ($h_i$ does not have the deleted flag set). - If $H$ contains all records in a history table prior to some point, the LU set represents what that ollection looked like at that point in time. - Given a *valid historical subset*, then the collections of LU records from each history table in the subset give us the state of the history-tracked collections at the point in time represented by the historical subset. -## Facts and assumptions - -**Facts** - -- History tables are append-only. -- Each history table records the changes made to the entire collection *in temporal order of the changes*. -- Each successive update to a collection is recorded by appending a record to its history table; therefore temporal order is also the order by ascending history id. - -**Assumptions** - -- No existing record in a history table is ever modified. - -**Therefore** - -- If a bookmark is associated to a record in a history table, it represents the history of that collection up to that point in time. A bookmark association can be thought of by analogy with a Git tag, in the sense that both are pointers to a specific state of the relevant items. -- Two such bookmark associations, say $B_1$ and $B_2$, bracket a set of changes recorded in the history table. The delta between them is exactly those changes recorded in the history table, in history id order, between $B_1$ (exclusive) and $B_2$ (inclusive). - -**For further consideration** - -- Bookmark associations can be, and most naturally are, stored in order of the association operations, that is, temporally. Therefore we can read out a series of successive changesets simply by examining the bookmark associations in the order they are made. - - However, that is not true if we allow bookmarking of non-latest states, which is probably going to be needed. We already have history, and we won't always anticipate future needs. Hmmm. - - Alternative ordering for bookmarks: In the order they occur according to the history table id, that is in history table temporal order. These should be consistent across all history tables; verify this thinking. - ## History operations We'll need a small handful of operations related directly to history records. These form the foundation for bookmark operations. diff --git a/pycds/orm/functions/bookmarking.py b/pycds/orm/functions/bookmarking.py deleted file mode 100644 index e5413f73..00000000 --- a/pycds/orm/functions/bookmarking.py +++ /dev/null @@ -1,46 +0,0 @@ -"""Functions, stored procedures, and trigger functions supporting bookmark operations. - -TODO: Rename to Alembic version. - -TODO: Define ReplaceableProcedure and supporting SQLA components, parallel to - ReplaceableFunction to create stored procedures. - See https://www.postgresql.org/docs/current/sql-createprocedure.html - Maybe a separate branch/PR and migration preceding this one. Yak shaving. - For now, use ReplaceableFunction ... DO NOT FORGET THIS. - -TODO: - Functions. Possibilities - - get LU - - validate history tuple - - ? create bookmark label - - create bookmark association - - create bookmark association now - - bracket updates - - create rollback - Trigger functions: - - -""" - -from pycds.alembic.extensions.replaceable_objects import ReplaceableFunction -from pycds.context import get_schema_name - - -schema_name = get_schema_name() - - -# Get LU history ids. -# This follows the pattern of other hxtk_ utility functions (receives collection name, -# etc.). -# Arguments: -# collection name -# where condition -# -# Returns table of history id's for LU set from collection history satisfying condition. -hxtk_get_latest_undeleted_hx_ids = ReplaceableFunction( - -) - - -# hxtk_create_bookmark_label = # Necessary? This is just an insert to the table. - - diff --git a/pycds/orm/functions/bookmarking.sql b/pycds/orm/functions/bookmarking.sql index 757c3782..61735947 100644 --- a/pycds/orm/functions/bookmarking.sql +++ b/pycds/orm/functions/bookmarking.sql @@ -347,6 +347,7 @@ $BODY$ -- CREATE TRIGGER t100_validate_history_tuple -- BEFORE INSERT OR UPDATE -- ON bookmark_associations + -- FOR EACH ROW -- EXECUTE PROCEDURE hxtk_validate_history_tuple() DECLARE BEGIN @@ -372,6 +373,7 @@ $BODY$ -- CREATE TRIGGER t200_check_bracket_end -- BEFORE INSERT OR UPDATE -- ON bookmark_associations + -- FOR EACH ROW -- EXECUTE PROCEDURE hxtk_check_bm_bracket_end() DECLARE bracket_begin_q text := From 9526b3f489c4597b467b9171c41b36e900509d6f Mon Sep 17 00:00:00 2001 From: rod-glover Date: Mon, 15 Sep 2025 10:26:19 -0700 Subject: [PATCH 11/13] Finish updating documentation --- docs/bookmarks.md | 162 ++++++++++++++++++---------------------------- 1 file changed, 63 insertions(+), 99 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index acc0a3bd..a850b8a5 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -2,16 +2,16 @@ ## Table of contents - - [Terminology](#terminology) - [Facts and assumptions](#facts-and-assumptions) - - [History operations](#history-operations) + - [Terminology](#terminology) + - [Basic operations](#basic-operations) + - [Note on history tuple vs `mod_time` for bookmarking](#note-on-history-tuple-vs-mod_time-for-bookmarking) - [Validate a history tuple](#validate-a-history-tuple) - - [Given a set of history records, get the latest undeleted (LU) records](#given-a-set-of-history-records-get-the-latest-undeleted-lu-records) - - [Bookmark operations](#bookmark-operations) + - [Get the latest undeleted (LU) records, given a subset of history records](#get-the-latest-undeleted-lu-records-given-a-subset-of-history-records) - [Create a bookmark label](#create-a-bookmark-label) - [Bookmark a point in history (create a bookmark association)](#bookmark-a-point-in-history-create-a-bookmark-association) - [Bracket (or group) a set of updates](#bracket-or-group-a-set-of-updates) - - [Applications](#applications) + - [Applications of bookmarking](#applications-of-bookmarking) - [Efficient grouping of data versions](#efficient-grouping-of-data-versions) - [An unsatisfactory approach](#an-unsatisfactory-approach) - [Bracketing](#bracketing) @@ -23,13 +23,9 @@ - [Example 2: Records inserted or updated in a specific time period](#example-2-records-inserted-or-updated-in-a-specific-time-period) - [Implementation notes](#implementation-notes) - [Tables](#tables) - - [Functions, stored procedures](#functions-stored-procedures) - - [Triggers](#triggers) - - [Metadata support set](#metadata-support-set) - - [Definitions](#definitions) - - [Historical support, $Sh(X)$](#historical-support-shx) - - [Current (or latest) support, $Sc(X)$](#current-or-latest-support-scx) - - [Support at tag, `St(X,T)`](#support-at-tag-stxt) + - [Other discussions](#other-discussions) + - [Updates to existing records: what supporting metadata?](#updates-to-existing-records-what-supporting-metadata) + - [Metadata support](#metadata-support) _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ @@ -85,9 +81,21 @@ _TOC courtesy of [Lucio Paiva](https://luciopaiva.com/markdown-toc/)._ - that item has not been deleted ($h_i$ does not have the deleted flag set). - If $H$ contains all records in a history table prior to some point, the LU set represents what that ollection looked like at that point in time. - Given a *valid historical subset*, then the collections of LU records from each history table in the subset give us the state of the history-tracked collections at the point in time represented by the historical subset. -## History operations +## Basic operations + +### Note on history tuple vs `mod_time` for bookmarking + +We could, in principle, use `mod_time` to define a point in history. A history subset would then be just all history records occurring at or before a given `mod_time`. This would have some advantages: + +- A single value rather than a tuple denotes a point in history. +- Validity (referential integrity) of a historical subset is trivially guaranteed so long as all history records satisfying the `mod_time` condition are included in the subset. There's no need to validate such a time value as there is for a tuple. -We'll need a small handful of operations related directly to history records. These form the foundation for bookmark operations. +However, we cannot always trust times in the database. Rather than fretting over time synchrony, we use the more reliable marker of position that is the history id. In that case, we have to store the history id's for each history table; hence a history tuple, not a single time value. + +***Note***: If all history tables shared the same history id sequence, then a single history id would in fact point unambiguously at a point in history, in the same way as `mod_time` but without the doubts about time. This might be worth considering. Notes: + +- You'd have to obtain from the shared history id sequence (for current point in history) or the history tables (for past points) the largest history id corresponding to your desired point in history. +- You'd need to migrate all the existing tables. For `obs_raw_hx`, this would be a pretty big operation, again, but nothing like as big an operation as the original history table population. ### Validate a history tuple @@ -126,7 +134,6 @@ Again looking a little forward in this document, a common case will be where the ##### Notes - Fidelity to actual history would require that there are no gaps in the set of history records, i.e., that we haven't arbitrarily dropped some from the middle. However, that is not strictly necessary for these operations to be performed.) -## Bookmark operations ### Create a bookmark label @@ -152,7 +159,7 @@ Again looking a little forward in this document, a common case will be where the **Operation**: Let $L_1$ and $L_2$ be two bookmarks. Let $U$ be a set of updates. Then the operation $Bracket(L_1, U, L_2)$ is defined as: -- *Within a transaction (i.e., atomically)*: +- *Within a transaction (i.e., in isolation from other operations)*: - Perform $Bookmark(L_1)$. - Perform updates $U$. - Perform $Bookmark(L_2)$. @@ -162,14 +169,21 @@ History records between $L_1$ (exclusive) and $L_2$ (inclusive) are exactly and - their isolation in the transaction (so no other update operations are interleaved); - the fact that change records are appended to the history tables in temporal order of change operations. -**Notes**: -- The above definition is abstract. It's not - -**For further discussion and analysis**: +**Other notes**: - Since the bookmarks for bracketing are directly related, we would probably do better to use a single bookmark label $L$, and allow the system to construct bookmarks $L_1$ and $L_2$ from $L$. In fact, we use the same label, and the bookmark association carries the distinction between $L_1$ and $L_2$. We can then define $Bracket(L, U) = Bracket(L_1, U, L_2)$, where $L_1$ and $L_2$ are the constructed bookmarks. See use of auxiliary columns in bookmark association table in [[#Implementation notes]] below. -## Applications +**Transactions, concurrency, and transaction isolation level** + +Bracketing is useful only if operations occurring outside the transaction in which bracketing is performed cannot be interleaved with the operations occurring inside it. This is called transaction isolation. + +The Postgres documentation for [Transaction Isolation](https://www.postgresql.org/docs/current/transaction-iso.html) states: + +> The SQL standard defines four levels of transaction isolation. The most strict is Serializable, which is defined by the standard in a paragraph which says that any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order. The other three levels are defined in terms of phenomena, resulting from interaction between concurrent transactions, which must not occur at each level. The standard notes that due to the definition of Serializable, none of these phenomena are possible at that level. (This is hardly surprising -- if the effect of the transactions must be consistent with having been run one at a time, how could you see any phenomena caused by interactions?) + +Therefore not only must we run bracketing within a transaction, the transaction must be run at Serializable isolation level. This is done with the [`SET TRANSACTION`](https://www.postgresql.org/docs/current/sql-set-transaction.html) command within the transaction in question. + +## Applications of bookmarking ### Efficient grouping of data versions @@ -308,24 +322,11 @@ All three considerations described above about what records are germane in the s ## Implementation notes -We begin to see the outlines of an implementation, as follows. +A provisional implementation is on branch [i-239-version-tagging](https://github.com/pacificclimate/pycds/tree/i-239-version-tagging). It includes types, tables, functions, trigger functions. In this section we just summarize and make some remarks and questions. ### Tables -**Table `bookmarks`** - -| Column | Type | Remarks | -| -------------- | ----------- | ------------------------------------------------------------------------- | -| `bookmark_id` | `int` | PK | -| `name` | `text` | | -| `comment` | `text` | Elaboration of meaning or use of the bookmark. Example: "QA release 2021" | -| ? `network_id` | `int` | FK `meta_network`. | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | - -Constraints - -- unique (`name`, `network_id`) +**Table `bookmark_labels`** Questions: @@ -336,30 +337,11 @@ Questions: **Table `bookmark_associations`** -Q: Why separate association from bookmark proper? +Q (rhetorical): Why separate association from bookmark proper? A: To support multiple uses of the same bookmark. - Brackets share the same bookmark info, but are associated as bracket-begin, bracket-end. - We likely want to bracket multiple groups of observations -- e.g., those ingested by `crmprtd` at any one time -- using the same bookmark. -| Column | Type | Remarks | -| ------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | -| `bookmark_association_id` | `int` | PK | -| `bookmark_id` | `int` | FK `bookmarks` | -| `role` | ?`enumeration`; ?`int` (FK to role value table); ?`text`. Values: `'singleton'`, `'bracket-begin'`, `'bracket-end'`. | Enumeration type is probably best. | -| `bracket_begin_id` | `int` | FK `bookmark_associations`. See discussion below. | -| `comment` | `text` | Nullable.
Auxiliary information about the association. | -| `obs_raw_hx_id` | `int` | PK `obs_raw_hx` | -| `meta_network_hx_id` | `int` | PK `meta_network_hx` | -| `meta_station_hx_id` | `int` | PK `meta_station_hx` | -| `meta_history_hx_id` | `int` | PK `meta_history_hx` | -| `meta_vars_hx_id` | `int` | PK `meta_vars_hx` | -| `mod_user` | `text` | | -| `mod_time` | `timestamp` | | -Constraints: - -- Tuple validity. -- Trigger function enforces constraint on `bracket_begin_id`. See discussion below. - Questions: 1. Apply history tracking to this table? Reason, utility? @@ -371,70 +353,52 @@ Constraints on bookmark associations and role: - The only constraints on brackets are: - bracket-begin and bracket-end must occur in matching pairs (open brackets, i.e., unmatched bracket-begins, are permitted). - a bracket-end must specify an open (not yet paired) bracket-begin that occurs before (in order of ascending `bookmark_association_id`) the bracket-begin. -- We could: +- We can (see trigger function): - Auto-generate `bracket_begin_id` for a bracket-end id when there is only one unpaired bracket-begin (that one's bracket id). This seems a likely scenario. However, it is surplus to requirements if we assume/require the user to carry the auto-generated bracket-begin id. -### Functions, stored procedures - -Since bookmarking is a non-trivial activity, it will be useful to encapsulate its operations in code. There is some question of whether some or all of this should be Python code in the PyCDS repo proper vs. SP's within the database, but we mix 'em all up here in one list. - -1. Create a bookmark association at current time (current state of database). -2. Check tuple validity. -3. Create a bookmark association from a past state (history tuple). Check validity of tuple. -4. Determine support (see [[#Metadata support set]]) of an observation. Result is a valid history tuple. Can then create bookmark association to it. -5. Perform bracketing operation. - 1. Create bookmark(s). - 2. Create bracket-begin association. - 3. Apply updates. - 4. Create bracket-end association. -### Triggers +## Other discussions -1. Enforce values of `mod_time`, `mod_user` in `bookmark_labels` and `bookmark_associations`. (As for history tracking; reuse tf.) +### Updates to existing records: what supporting metadata? -## Metadata support set +This isn't necessarily a question about bookmarking, or isn't exclusively so. -***Note/TODO***: This section may not be very useful any more ... but I include it for consideration. It may also be overcomplicated ... the support of a set of observations may be more general than is really useful. +Let's use `obs_raw` as the example, as it is covers all other cases. -The idea of the "metadata support" may prove useful in talking clearly about bookmarking. In particular, it may prove useful in discussing bookmarking or bracketing a set of observations post hoc. From here on, we may abbreviate "metadata support" to "support". +When we update an existing `obs_raw` record, we can ask what are the relevant metadata history items to link with it in history. -Support enables us to talk in a well-defined, compact way about the metadata relevant to an observation (or set of observations), when the observations are the only handle you have at the outset. More accurately, we should say observation histories, since observations are mutable and not the target of bookmarking. +This question is interesting because now that we have history tracking we also have a new phenomenon, which is that updates to metadata and updates to observations are not interchangeable. Order now matters. -The support of an observation history record $X$ is the set of metadata (history) records directly relevant to $X$, which is to say directly associated to $X$ by one or more FK links away from the observation. This in fact applies to any history record $X$, but observation histories are the most important and are the most general or complex case. +For any given `obs_raw` record, possible answers are: -### Definitions +1. The current (latest) metadata history record for the metadata item linked to by the `obs_raw` record. + - The simplest answer, and the one that matches what happened before the advent of history tracking. It is also what happens automatically via the history foreign key maintenance trigger whenever a record is created or updated. +2. The metadata history record linked to the existing (un-updated) `obs_raw_hx` history record for that item (this may be the same as the current one, but it equally well may be quite a lot older). + - It is possible to imagine wanting to update a "frozen" state of the database, i.e., to update `obs_raw` based on an earlier state of the database that could be obtained from a history rollback, which would imply using the links to the older states of the relevant metadata items. +3. Some intermediate metadata history record between those for (1) and (2), if any exist. + - This just seems like asking for trouble. -We define 2 particular cases of support that are especially relevant: +There is no *a priori* answer to this question; any answer could be correct. To multiply questions and confusion, this choice applies transitively to all metadata items linked indirectly to the obs_raw record (i.e., station and network). -#### Historical support, $Sh(X)$ +### Metadata support -- The *historical (metadata) support* of observation history record $X$, denoted $Sh(X)$, is the tuple of metadata history records linked to it via history table foreign keys followed directly from one history record to another. -- There is always exactly one of each metadata history record type (Network, Station, Station History, Variable) in this tuple. -- This tuple is the precise metadata state at the time of creation of $X$. -- This tuple *does not change* when updates to the corresponding metadata items are made. +This a closely related topic, and maybe can precede the discussion above, but ... -We can easily generalize this to a set $S$ of history records: +The *historical metadata support* of an observation history record $H$ is the set of metadata (history) records directly relevant to $H$, which is to say directly associated to $H$ by one or more FK links away from the observation. +We denote this by $S(H)$. Notes: -- $Sh(S) = \bigcup_{X \in S} Sh(X)$ +- This in fact applies to any history record $H$, but observation histories are the most important and are the most general or complex case. +- We have only unidirectional links to consider at the moment, but if we include many:many relationships in history tracking, then this definition will possibly become a little more complicated. -#### Current (or latest) support, $Sc(X)$ +The historical metadata support of a set of observation history records $K$ is the union of the support of each record $H \in K$. We write $S(K) = \bigcup_{H \in S} S(H)$. -- The *latest (metadata) support* of observation history record $X$, denoted $Sc(X)$, is the set of metadata history records defined as: For each record in $Sh(X)$, use the current latest record for that metadata item; equivalently, use the metadata *item* foreign key to retrieve that record from the primary table. -- There is always exactly one of each metadata history record type (Network, Station, Station History, Variable) in this set. -- This set provides the *current state* of metadata relevant to $X$, with all updates to those items. This set *changes* when the corresponding metadata items are updated, and is not fixed over time. +When we select a set of observations to be bookmarked after the fact, that is after the database has experienced more changes, we need to also include the metadata support of that set. This will make bookmarking after the fact somewhat trickier. -We can easily generalize this to a set $S$ of history records: +As we note above, we can think about the *current metadata support* for history records, which would be those records linked via item id, not by history id. -- $Sc(S) = \bigcup_{X \in S} Sc(X)$ +This is really just a restatement of the considerations above. -#### Support at tag, `St(X,T)` -Is this still relevant? It seems that if we define bookmarking as an association to a tuple of history records, then this whole thing is redundant. -A slightly less self-evident case of support -- The support set of observation history record $X$ at tag `T`, denoted `St(X,T)`, is the set of metadata history records defined by: For each record in $Sh(X)$, use the metadata history record tagged by `T` for that metadata item. -- There may be no such metadata history record for some or all of the elements of $Sc(X)$. Therefore `St(X,T)` may not contain one item for every metadata record type. -- Tag `T` can tag *any* metadata history record in an item's history set. Therefore the elements of `St(X,T)` may occur *before* the historical support items for $X$. This may or may not make sense in any given context. -It is possible to define other support sets with different criteria for what metadata history records are included, but defining the criteria so that they are consistent and make sense is harder. We do not offer any other definitions here. \ No newline at end of file From 0063f880bcb1b7533011c8094920749fb57e40a5 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Tue, 23 Sep 2025 14:42:38 -0700 Subject: [PATCH 12/13] Minor corrections --- docs/bookmarks.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/bookmarks.md b/docs/bookmarks.md index a850b8a5..cd5c6d1b 100644 --- a/docs/bookmarks.md +++ b/docs/bookmarks.md @@ -384,13 +384,12 @@ There is no *a priori* answer to this question; any answer could be correct. To This a closely related topic, and maybe can precede the discussion above, but ... -The *historical metadata support* of an observation history record $H$ is the set of metadata (history) records directly relevant to $H$, which is to say directly associated to $H$ by one or more FK links away from the observation. -We denote this by $S(H)$. Notes: +The *historical metadata support* of an observation history record $H$ is the set of metadata (history) records directly relevant to $H$, which is to say directly associated to $H$ by one or more FK links away from the observation. We denote this by $S(H)$. Notes: - This in fact applies to any history record $H$, but observation histories are the most important and are the most general or complex case. - We have only unidirectional links to consider at the moment, but if we include many:many relationships in history tracking, then this definition will possibly become a little more complicated. -The historical metadata support of a set of observation history records $K$ is the union of the support of each record $H \in K$. We write $S(K) = \bigcup_{H \in S} S(H)$. +The historical metadata support of a set of observation history records $K$ is the union of the support of each record $H \in K$. We write $S(K) = \bigcup_{H \in K} S(H)$. When we select a set of observations to be bookmarked after the fact, that is after the database has experienced more changes, we need to also include the metadata support of that set. This will make bookmarking after the fact somewhat trickier. From fe42fe6db95baedc7022ce026f8703b8204c0458 Mon Sep 17 00:00:00 2001 From: rod-glover Date: Wed, 24 Sep 2025 15:58:06 -0700 Subject: [PATCH 13/13] Update hx tracking overview diagram --- docs/history-tracking-overview.excalidraw | 643 +++++++++++----------- 1 file changed, 313 insertions(+), 330 deletions(-) diff --git a/docs/history-tracking-overview.excalidraw b/docs/history-tracking-overview.excalidraw index 304e592d..8eef118f 100644 --- a/docs/history-tracking-overview.excalidraw +++ b/docs/history-tracking-overview.excalidraw @@ -1,12 +1,12 @@ { "type": "excalidraw", "version": 2, - "source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.6.7", + "source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.14.3", "elements": [ { "type": "rectangle", - "version": 746, - "versionNonce": 1634433902, + "version": 987, + "versionNonce": 2019939040, "index": "aM", "isDeleted": false, "id": "tPneSgy9-ER8wbYV3t7rf", @@ -16,10 +16,10 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 343.98140153924237, - "y": -434.51111201846356, + "x": 927.593862912328, + "y": -177.62186620039597, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "transparent", "width": 212.41664296929588, "height": 199.99999999999991, "seed": 1682344250, @@ -38,7 +38,7 @@ "type": "arrow" } ], - "updated": 1730220119371, + "updated": 1758744681397, "link": null, "locked": false, "customData": { @@ -47,8 +47,8 @@ }, { "type": "text", - "version": 629, - "versionNonce": 1481681390, + "version": 869, + "versionNonce": 1441298144, "index": "aMV", "isDeleted": false, "id": "Dw1n5NgF", @@ -58,8 +58,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 389.96148205709346, - "y": -429.51111201846356, + "x": 973.573943430179, + "y": -172.62186620039597, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 120.45648193359375, @@ -69,7 +69,7 @@ "frameId": null, "roundness": null, "boundElements": [], - "updated": 1730220125332, + "updated": 1758744681397, "link": null, "locked": false, "fontSize": 28, @@ -85,8 +85,8 @@ }, { "type": "rectangle", - "version": 914, - "versionNonce": 1070822795, + "version": 1156, + "versionNonce": 60833504, "index": "aMVV", "isDeleted": false, "id": "HP8dFO0w6b8dQ6Er-z3Xz", @@ -96,10 +96,10 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 338.94226039163414, - "y": 1066.3646562997797, + "x": 922.5547217647197, + "y": 1323.2539021178472, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "#e9ecef", "width": 219.58143314329794, "height": 351, "seed": 1832701158, @@ -130,7 +130,7 @@ "type": "arrow" } ], - "updated": 1754949056677, + "updated": 1758744681397, "link": null, "locked": false, "customData": { @@ -139,8 +139,8 @@ }, { "type": "text", - "version": 819, - "versionNonce": 679644203, + "version": 1063, + "versionNonce": 1727357664, "index": "aMW", "isDeleted": false, "id": "i3rgw8Kk", @@ -150,35 +150,35 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 365.99667172646673, - "y": 1071.3646562997797, + "x": 949.6891044130289, + "y": 1328.2539021178472, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 165.4726104736328, - "height": 302.40000000000003, + "width": 165.3126678466797, + "height": 340.20000000000005, "seed": 728718458, "groupIds": [], "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754949056677, + "updated": 1758744681397, "link": null, "locked": false, "fontSize": 28, "fontFamily": 6, - "text": " a_hx\n-------\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\na_id\nx", - "rawText": "
a_hx\n-------\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\na_id\nx", + "text": "
a_hx\n-------\na_id\nx\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", + "rawText": "
a_hx\n-------\na_id\nx\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", "textAlign": "center", "verticalAlign": "top", "containerId": "HP8dFO0w6b8dQ6Er-z3Xz", - "originalText": "
a_hx\n-------\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\na_id\nx", + "originalText": "
a_hx\n-------\na_id\nx\na_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", "autoResize": true, "lineHeight": 1.35 }, { "type": "rectangle", - "version": 882, - "versionNonce": 1702811531, + "version": 1123, + "versionNonce": 2128320224, "index": "aMWG", "isDeleted": false, "id": "HENr2olXzuzzezNs3HaA4", @@ -188,10 +188,10 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 343.74295363253555, - "y": 224.76059477682293, + "x": 927.3554150056211, + "y": 481.6498405948905, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "transparent", "width": 219.58143314329794, "height": 351, "seed": 526137786, @@ -222,7 +222,7 @@ "type": "arrow" } ], - "updated": 1754948758749, + "updated": 1758744681397, "link": null, "locked": false, "customData": { @@ -231,8 +231,8 @@ }, { "type": "text", - "version": 838, - "versionNonce": 1048482021, + "version": 1082, + "versionNonce": 285588192, "index": "aMWV", "isDeleted": false, "id": "x9CH03On", @@ -242,35 +242,35 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 389.8434312515478, - "y": 229.76059477682293, + "x": 974.4818707130123, + "y": 486.6498405948905, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 127.38047790527344, - "height": 226.8, + "width": 125.32852172851562, + "height": 264.6, "seed": 1055446758, "groupIds": [], "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754948786518, + "updated": 1758744681397, "link": null, "locked": false, "fontSize": 28, "fontFamily": 6, - "text": "
a\n-------\nmod_time\nmod_user\na_id (PK)\nx", - "rawText": "
a\n-------\nmod_time\nmod_user\na_id (PK)\nx", + "text": "
a\n-------\na_id (PK)\nx\nmod_time\nmod_user\n", + "rawText": "
a\n-------\na_id (PK)\nx\nmod_time\nmod_user\n", "textAlign": "center", "verticalAlign": "top", "containerId": "HENr2olXzuzzezNs3HaA4", - "originalText": "
a\n-------\nmod_time\nmod_user\na_id (PK)\nx", + "originalText": "
a\n-------\na_id (PK)\nx\nmod_time\nmod_user\n", "autoResize": true, "lineHeight": 1.35 }, { "type": "rectangle", - "version": 870, - "versionNonce": 1977336302, + "version": 1111, + "versionNonce": 1549880032, "index": "aMX", "isDeleted": false, "id": "rPi80OCsDY1oxdpcLjLi4", @@ -280,10 +280,10 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 863.7050453468169, - "y": -445.9407018992092, + "x": 1447.3175067199024, + "y": -189.05145608114162, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "transparent", "width": 198.08706262129178, "height": 219.66897746967072, "seed": 192480634, @@ -302,7 +302,7 @@ "type": "arrow" } ], - "updated": 1728343910542, + "updated": 1758744681398, "link": null, "locked": false, "customData": { @@ -311,8 +311,8 @@ }, { "type": "text", - "version": 756, - "versionNonce": 503779694, + "version": 996, + "versionNonce": 464842464, "index": "aMZ", "isDeleted": false, "id": "Nwc49WhE", @@ -322,18 +322,18 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 901.4823641940839, - "y": -440.9407018992092, + "x": 1485.3767927302554, + "y": -184.05145608114162, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 122.53242492675781, + "width": 121.96849060058594, "height": 189.00000000000003, "seed": 1218422566, "groupIds": [], "frameId": null, "roundness": null, "boundElements": [], - "updated": 1730220133520, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 28, @@ -349,8 +349,8 @@ }, { "type": "rectangle", - "version": 1016, - "versionNonce": 1164874059, + "version": 1260, + "versionNonce": 716093152, "index": "aMa", "isDeleted": false, "id": "cXv2VOhBtuqYgae3ODq6c", @@ -360,12 +360,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 859.8526446365463, - "y": 1051.4743301288206, + "x": 1443.4651060096319, + "y": 1308.3635759468882, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "#e9ecef", "width": 219.58143314329797, - "height": 389, + "height": 426, "seed": 186531878, "groupIds": [], "frameId": null, @@ -386,7 +386,7 @@ "type": "arrow" } ], - "updated": 1754949056679, + "updated": 1758744681398, "link": null, "locked": false, "customData": { @@ -395,8 +395,8 @@ }, { "type": "text", - "version": 935, - "versionNonce": 1271148523, + "version": 1179, + "versionNonce": 359979744, "index": "aMb", "isDeleted": false, "id": "c0WKu6Wy", @@ -406,35 +406,35 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 886.4450503561445, - "y": 1056.4743301288206, + "x": 1470.1374830427067, + "y": 1313.3635759468882, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 166.39662170410156, - "height": 378.00000000000006, + "width": 166.23667907714844, + "height": 415.80000000000007, "seed": 1828830522, "groupIds": [], "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754949056679, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 28, "fontFamily": 6, - "text": "
b_hx\n-------\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\nb_id\na_id\ny\na_hx_id (FK)", - "rawText": "
b_hx\n-------\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\nb_id\na_id\ny\na_hx_id (FK)", + "text": "
b_hx\n-------\nb_id\na_id\ny\na_hx_id (FK)\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", + "rawText": "
b_hx\n-------\nb_id\na_id\ny\na_hx_id (FK)\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", "textAlign": "center", "verticalAlign": "top", "containerId": "cXv2VOhBtuqYgae3ODq6c", - "originalText": "
b_hx\n-------\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\nb_id\na_id\ny\na_hx_id (FK)", + "originalText": "
b_hx\n-------\nb_id\na_id\ny\na_hx_id (FK)\nb_hx_id (PK)\ndeleted\nmod_time\nmod_user\n", "autoResize": true, "lineHeight": 1.35 }, { "type": "rectangle", - "version": 1071, - "versionNonce": 381003140, + "version": 1312, + "versionNonce": 195839712, "index": "aMbV", "isDeleted": false, "id": "jvtpsSfKxVMT4c7AFLiba", @@ -444,10 +444,10 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 861.1871333713818, - "y": 210.35439734173815, + "x": 1444.7995947444674, + "y": 467.24364315980574, "strokeColor": "#1e1e1e", - "backgroundColor": "#a5d8ff", + "backgroundColor": "transparent", "width": 219.58143314329797, "height": 365.34142114384747, "seed": 568390266, @@ -474,7 +474,7 @@ "type": "arrow" } ], - "updated": 1729117567263, + "updated": 1758744681398, "link": null, "locked": false, "customData": { @@ -483,8 +483,8 @@ }, { "type": "text", - "version": 989, - "versionNonce": 1442679301, + "version": 1233, + "versionNonce": 541092576, "index": "aMc", "isDeleted": false, "id": "fziB80tY", @@ -494,35 +494,35 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 907.2876109903941, - "y": 215.35439734173815, + "x": 1491.9260504518586, + "y": 472.24364315980574, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 127.38047790527344, - "height": 264.6, + "width": 125.32852172851562, + "height": 302.40000000000003, "seed": 515443238, "groupIds": [], "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754948813569, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 28, "fontFamily": 6, - "text": "
b\n-------\nmod_time\nmod_user\nb_id (PK)\na_id (FK)\ny", - "rawText": "
b\n-------\nmod_time\nmod_user\nb_id (PK)\na_id (FK)\ny", + "text": "
b\n-------\nb_id (PK)\na_id (FK)\ny\nmod_time\nmod_user\n", + "rawText": "
b\n-------\nb_id (PK)\na_id (FK)\ny\nmod_time\nmod_user\n", "textAlign": "center", "verticalAlign": "top", "containerId": "jvtpsSfKxVMT4c7AFLiba", - "originalText": "
b\n-------\nmod_time\nmod_user\nb_id (PK)\na_id (FK)\ny", + "originalText": "
b\n-------\nb_id (PK)\na_id (FK)\ny\nmod_time\nmod_user\n", "autoResize": true, "lineHeight": 1.35 }, { "type": "arrow", - "version": 986, - "versionNonce": 1229347781, + "version": 1732, + "versionNonce": 1302351648, "index": "aQ", "isDeleted": false, "id": "BO_UhaJWAx-Ax-bvbriTA", @@ -532,12 +532,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 862.7050453468169, - "y": -348.4441176334223, + "x": 1446.3175067199022, + "y": -91.5549784257442, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 305.30700083827855, - "height": 1.148173945832582, + "width": 305.3070008382783, + "height": 1.1484677255783708, "seed": 540007590, "groupIds": [], "frameId": null, @@ -550,7 +550,7 @@ "id": "zfWxOAqa" } ], - "updated": 1754948583238, + "updated": 1758744681401, "link": null, "locked": false, "customData": { @@ -558,15 +558,13 @@ }, "startBinding": { "elementId": "rPi80OCsDY1oxdpcLjLi4", - "focus": 0.10853823050535973, - "gap": 1, - "fixedPoint": null + "focus": 0.10853823050535995, + "gap": 1.0000000000002274 }, "endBinding": { "elementId": "tPneSgy9-ER8wbYV3t7rf", - "focus": -0.1542324610144023, - "gap": 1, - "fixedPoint": null + "focus": -0.15423244283818158, + "gap": 1 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -577,8 +575,8 @@ 0 ], [ - -305.30700083827855, - -1.148173945832582 + -305.3070008382783, + -1.1484677255783708 ] ], "elbowed": false @@ -597,7 +595,7 @@ "opacity": 100, "angle": 0, "x": 654.6813132992596, - "y": -367.9142912336098, + "y": -367.9184435206989, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 110.74046325683594, @@ -623,8 +621,8 @@ }, { "type": "arrow", - "version": 1381, - "versionNonce": 88859147, + "version": 2130, + "versionNonce": 2096039712, "index": "aS", "isDeleted": false, "id": "O1gYgG5p1CopkbDz1giYX", @@ -634,12 +632,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 858.8526446365463, - "y": 1248.5996463939005, + "x": 1442.4651060096319, + "y": 1519.1236403121113, "strokeColor": "#1e1e1e", - "backgroundColor": "transparent", - "width": 299.32895110161417, - "height": 2.6746973492856796, + "backgroundColor": "#e9ecef", + "width": 299.32895110161405, + "height": 10.956798917842661, "seed": 720325478, "groupIds": [], "frameId": null, @@ -652,7 +650,7 @@ "type": "text" } ], - "updated": 1754949057127, + "updated": 1758744681402, "link": null, "locked": false, "customData": { @@ -660,15 +658,13 @@ }, "startBinding": { "elementId": "cXv2VOhBtuqYgae3ODq6c", - "focus": -0.008365668725549101, - "gap": 1, - "fixedPoint": null + "focus": -0.00836566872555028, + "gap": 1 }, "endBinding": { "elementId": "HP8dFO0w6b8dQ6Er-z3Xz", - "focus": 0.05895110255980628, - "gap": 1, - "fixedPoint": null + "focus": 0.029843641840676307, + "gap": 1.0000000000002274 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -679,8 +675,8 @@ 0 ], [ - -299.32895110161417, - 2.6746973492856796 + -299.32895110161405, + -10.956798917842661 ] ], "elbowed": false @@ -698,8 +694,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 631.2078452942353, - "y": 1015.9676576584678, + "x": 631.2078452942354, + "y": 1231.0389182963413, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 155.9606475830078, @@ -725,8 +721,8 @@ }, { "type": "line", - "version": 127, - "versionNonce": 2108110054, + "version": 367, + "versionNonce": 1614913248, "index": "aU", "isDeleted": false, "id": "vKiMNdtNFZcAmvLj0rRGx", @@ -736,8 +732,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": -48.8823757856519, - "y": -42.93420706373922, + "x": 534.7300855874337, + "y": 213.95503875432837, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 1590.987868284229, @@ -749,7 +745,7 @@ "type": 2 }, "boundElements": [], - "updated": 1727915590548, + "updated": 1758744681398, "link": null, "locked": false, "startBinding": null, @@ -766,12 +762,13 @@ 1590.987868284229, 0 ] - ] + ], + "polygon": false }, { "type": "arrow", - "version": 363, - "versionNonce": 1162827307, + "version": 1112, + "versionNonce": 1106231072, "index": "aV", "isDeleted": false, "id": "i9Z7cwn19gcx7vYmw0vuL", @@ -781,11 +778,11 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 444.60469361158533, - "y": 582.2315775452738, + "x": 1028.0631607523278, + "y": 839.1208233633414, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 1.1557339928377814, + "width": 1.266889807974394, "height": 478.98707712087696, "seed": 812761446, "groupIds": [], @@ -794,20 +791,18 @@ "type": 2 }, "boundElements": [], - "updated": 1754949057126, + "updated": 1758744681402, "link": null, "locked": false, "startBinding": { "elementId": "HENr2olXzuzzezNs3HaA4", - "focus": 0.08674691154783774, - "gap": 6.470982768450881, - "fixedPoint": null + "focus": 0.08674691154783735, + "gap": 6.470982768450881 }, "endBinding": { "elementId": "HP8dFO0w6b8dQ6Er-z3Xz", - "focus": -0.023015858566024266, - "gap": 5.1460016336288845, - "fixedPoint": null + "focus": -0.023015673089675795, + "gap": 5.1460016336288845 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -818,7 +813,7 @@ 0 ], [ - 1.1557339928377814, + 1.266889807974394, 478.98707712087696 ] ], @@ -826,8 +821,8 @@ }, { "type": "arrow", - "version": 423, - "versionNonce": 1832478891, + "version": 1172, + "versionNonce": 1478653728, "index": "aX", "isDeleted": false, "id": "9WMVthyuDpbCAfjaELGd8", @@ -837,12 +832,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 965.2318796042238, - "y": 577.0846341134422, + "x": 1549.237243028767, + "y": 833.9738799315098, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 2.5712294357098244, - "height": 473.3896960153784, + "width": 2.7824361996333664, + "height": 473.38969601537815, "seed": 1501286074, "groupIds": [], "frameId": null, @@ -850,20 +845,18 @@ "type": 2 }, "boundElements": [], - "updated": 1754949057128, + "updated": 1758744681403, "link": null, "locked": false, "startBinding": { "elementId": "jvtpsSfKxVMT4c7AFLiba", - "focus": 0.03826116902876437, - "gap": 1.3888156278566157, - "fixedPoint": null + "focus": 0.038526562110838374, + "gap": 1.3888156278566157 }, "endBinding": { "elementId": "cXv2VOhBtuqYgae3ODq6c", - "focus": -0.07257358327285979, - "gap": 1, - "fixedPoint": null + "focus": -0.07257427349806503, + "gap": 1.0000000000002274 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -874,16 +867,16 @@ 0 ], [ - -2.5712294357098244, - 473.3896960153784 + -2.7824361996333664, + 473.38969601537815 ] ], "elbowed": false }, { "type": "arrow", - "version": 293, - "versionNonce": 540064997, + "version": 1043, + "versionNonce": 453455648, "index": "aY", "isDeleted": false, "id": "BBN5edyBWznLdB7RjCxyA", @@ -893,12 +886,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 857.5301025505701, - "y": 393.7846113679031, + "x": 1441.1425639236559, + "y": 650.6373730668962, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 293.2057157747365, - "height": 0.038224894892039174, + "width": 293.20571577473663, + "height": 0.06444296667461913, "seed": 988830566, "groupIds": [], "frameId": null, @@ -911,7 +904,7 @@ "id": "WW6l9VVG" } ], - "updated": 1754948858850, + "updated": 1758744681403, "link": null, "locked": false, "customData": { @@ -919,15 +912,13 @@ }, "startBinding": { "elementId": "jvtpsSfKxVMT4c7AFLiba", - "focus": -0.0038210336795078043, - "gap": 3.6570308208117694, - "fixedPoint": null + "focus": -0.0038210434833155704, + "gap": 3.657030820811542 }, "endBinding": { "elementId": "HENr2olXzuzzezNs3HaA4", - "focus": -0.036597099884290454, - "gap": 1, - "fixedPoint": null + "focus": -0.03659709988428956, + "gap": 1 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -938,8 +929,8 @@ 0 ], [ - -293.2057157747365, - 0.038224894892039174 + -293.20571577473663, + 0.06444296667461913 ] ], "elbowed": false @@ -957,8 +948,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 655.5570130347838, - "y": 375.98696338208765, + "x": 655.557013034784, + "y": 374.8821877418291, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 110.74046325683594, @@ -984,8 +975,8 @@ }, { "type": "arrow", - "version": 554, - "versionNonce": 1671208197, + "version": 1303, + "versionNonce": 102623008, "index": "aa", "isDeleted": false, "id": "xPf51myhWpmHyctO8gD4x", @@ -995,12 +986,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 338.7429536325356, - "y": 392.00095225398957, + "x": 922.3554150056211, + "y": 648.8901980720573, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "width": 287.1053987422777, - "height": 798.7524746782004, + "height": 798.7524746782003, "seed": 1544236326, "groupIds": [], "frameId": null, @@ -1013,7 +1004,7 @@ "id": "48I2Dkoy" } ], - "updated": 1754949804104, + "updated": 1758744681404, "link": null, "locked": false, "customData": { @@ -1021,15 +1012,13 @@ }, "startBinding": { "elementId": "HENr2olXzuzzezNs3HaA4", - "focus": 0.4511744623232074, - "gap": 4.999999999999943, - "fixedPoint": null + "focus": 0.451174462323208, + "gap": 5 }, "endBinding": { "elementId": "HP8dFO0w6b8dQ6Er-z3Xz", - "focus": -0.4024209265827109, - "gap": 5, - "fixedPoint": null + "focus": -0.402420926582711, + "gap": 5 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -1041,11 +1030,11 @@ ], [ -287.1053987422777, - 312.03191173946357 + 312.0319117394635 ], [ - -4.8006932409015235, - 798.7524746782004 + -4.800693240901467, + 798.7524746782003 ] ], "elbowed": false @@ -1063,11 +1052,11 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": -89.93699955310149, + "x": -89.4130279954843, "y": 609.5328639934531, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 283.14910888671875, + "width": 282.1011657714844, "height": 189.00000000000003, "seed": 1345294886, "groupIds": [], @@ -1090,8 +1079,8 @@ }, { "type": "arrow", - "version": 1601, - "versionNonce": 1628254693, + "version": 2096, + "versionNonce": 887529248, "index": "ac", "isDeleted": false, "id": "Ejrady7_F26x59KCnQm8d", @@ -1101,12 +1090,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 335.38219565178395, - "y": 1244.269294876459, + "x": 918.9946570248694, + "y": 1496.1162351404553, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 353.472075952014, - "height": 141.53382450484986, + "width": 353.47207595201394, + "height": 141.5338245048497, "seed": 333390054, "groupIds": [], "frameId": null, @@ -1119,7 +1108,7 @@ "id": "FaC5v5iR" } ], - "updated": 1754949819020, + "updated": 1758744681405, "link": null, "locked": false, "customData": { @@ -1127,15 +1116,13 @@ }, "startBinding": { "elementId": "HP8dFO0w6b8dQ6Er-z3Xz", - "focus": 0.10225909254202926, - "gap": 3.5600647398501906, - "fixedPoint": null + "focus": 0.10225909254202985, + "gap": 3.5600647398503042 }, "endBinding": { "elementId": "HP8dFO0w6b8dQ6Er-z3Xz", - "focus": -0.8259143012677729, - "gap": 4.950333241940314, - "fixedPoint": null + "focus": -0.825914301267772, + "gap": 4.9503332419403705 }, "lastCommittedPoint": null, "startArrowhead": null, @@ -1143,15 +1130,15 @@ "points": [ [ 0, - -5.042305554071309 + 0 ], [ - -353.472075952014, - 47.944763915763815 + -353.47207595201394, + 52.98706946983512 ], [ - -1.390268502090123, - 136.49151895077856 + -1.3902685020900662, + 141.5338245048497 ] ], "elbowed": false @@ -1169,11 +1156,11 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": -221.17336880533645, - "y": 1123.8771041804334, + "x": -154.3944609887066, + "y": 1235.5140587922228, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", - "width": 273.13311767578125, + "width": 272.6091613769531, "height": 113.4, "seed": 1010913190, "groupIds": [], @@ -1196,8 +1183,8 @@ }, { "type": "text", - "version": 388, - "versionNonce": 1552724613, + "version": 628, + "versionNonce": 183826144, "index": "ag", "isDeleted": false, "id": "kn3WsXHO", @@ -1207,8 +1194,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": -39.25997794834933, - "y": -199.8680659040105, + "x": 544.3524834247362, + "y": 57.021179914057086, "strokeColor": "#1971c2", "backgroundColor": "transparent", "width": 270.3639514731369, @@ -1218,7 +1205,7 @@ "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754949490445, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 36, @@ -1234,8 +1221,8 @@ }, { "type": "text", - "version": 478, - "versionNonce": 1290640683, + "version": 718, + "versionNonce": 317922016, "index": "ah", "isDeleted": false, "id": "uVJ7NM4f", @@ -1245,8 +1232,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": -41.86434236811351, - "y": 34.088158010816244, + "x": 541.7481190049721, + "y": 290.97740382888384, "strokeColor": "#1971c2", "backgroundColor": "transparent", "width": 270.3639514731369, @@ -1256,7 +1243,7 @@ "frameId": null, "roundness": null, "boundElements": [], - "updated": 1754949502482, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 36, @@ -1272,8 +1259,8 @@ }, { "type": "rectangle", - "version": 148, - "versionNonce": 1638070650, + "version": 388, + "versionNonce": 1066188512, "index": "am", "isDeleted": false, "id": "x7CuDh-Mp7TNl7l5vQaSc", @@ -1283,8 +1270,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 322.00150636339504, - "y": -86.26176338956259, + "x": 905.6139677364806, + "y": 170.627482428505, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 258.23223570190635, @@ -1309,7 +1296,7 @@ "type": "arrow" } ], - "updated": 1728330182457, + "updated": 1758744681398, "link": null, "locked": false, "customData": { @@ -1318,8 +1305,8 @@ }, { "type": "text", - "version": 86, - "versionNonce": 1952622970, + "version": 326, + "versionNonce": 1901191904, "index": "an", "isDeleted": false, "id": "WQK7DSMb", @@ -1329,8 +1316,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 365.21325867479743, - "y": -65.30041156980522, + "x": 948.825720047883, + "y": 191.58883424826237, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 171.80873107910156, @@ -1342,7 +1329,7 @@ "frameId": null, "roundness": null, "boundElements": [], - "updated": 1728341083453, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 28, @@ -1358,8 +1345,8 @@ }, { "type": "arrow", - "version": 35, - "versionNonce": 1794213, + "version": 528, + "versionNonce": 419805984, "index": "ao", "isDeleted": false, "id": "73M37a7X001oZycu-OGrM", @@ -1369,12 +1356,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 445.0517663287328, - "y": -93.19417240169423, + "x": 1028.6642277018184, + "y": 163.69507341637336, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", - "width": 0, - "height": 43.32755632582325, + "width": 5.684341886080802e-14, + "height": 43.32755632582327, "seed": 1786945274, "groupIds": [ "ZrfqJU2aDhETO8gi_Mw_v" @@ -1384,14 +1371,13 @@ "type": 2 }, "boundElements": [], - "updated": 1754948583239, + "updated": 1758744681405, "link": null, "locked": false, "startBinding": { "elementId": "x7CuDh-Mp7TNl7l5vQaSc", - "focus": -0.046979865771813255, - "gap": 6.932409012131643, - "fixedPoint": null + "focus": -0.046979865771814074, + "gap": 6.932409012131643 }, "endBinding": null, "lastCommittedPoint": null, @@ -1403,16 +1389,16 @@ 0 ], [ - 0, - -43.32755632582325 + 5.684341886080802e-14, + -43.32755632582327 ] ], "elbowed": false }, { "type": "arrow", - "version": 135, - "versionNonce": 697435141, + "version": 628, + "versionNonce": 1658935072, "index": "ap", "isDeleted": false, "id": "NUtdy3OrdfhJ9-YLFFiFs", @@ -1422,8 +1408,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 446.3862550635681, - "y": -5.539059750047841, + "x": 1029.9987164366537, + "y": 251.35018606801975, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 0, @@ -1437,14 +1423,13 @@ "type": 2 }, "boundElements": [], - "updated": 1754948583239, + "updated": 1758744681405, "link": null, "locked": false, "startBinding": { "elementId": "x7CuDh-Mp7TNl7l5vQaSc", - "focus": 0.03664429530201499, - "gap": 1, - "fixedPoint": null + "focus": 0.03664429530201405, + "gap": 1 }, "endBinding": null, "lastCommittedPoint": null, @@ -1464,8 +1449,8 @@ }, { "type": "rectangle", - "version": 195, - "versionNonce": 1800278778, + "version": 435, + "versionNonce": 2137528032, "index": "aq", "isDeleted": false, "id": "YDN2CFYJD-5V9zKvieZpl", @@ -1475,8 +1460,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 843.5226683866961, - "y": -95.32588817292479, + "x": 1427.1351297597816, + "y": 161.5633576451428, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 258.23223570190635, @@ -1501,7 +1486,7 @@ "type": "arrow" } ], - "updated": 1728341085248, + "updated": 1758744681398, "link": null, "locked": false, "customData": { @@ -1510,8 +1495,8 @@ }, { "type": "text", - "version": 132, - "versionNonce": 1399170022, + "version": 372, + "versionNonce": 1275214560, "index": "ar", "isDeleted": false, "id": "NBLRrH2O", @@ -1521,8 +1506,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 885.9784163646024, - "y": -74.36453635316742, + "x": 1469.590877737688, + "y": 182.52470946490018, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 173.32073974609375, @@ -1534,7 +1519,7 @@ "frameId": null, "roundness": null, "boundElements": [], - "updated": 1728341088843, + "updated": 1758744681398, "link": null, "locked": false, "fontSize": 28, @@ -1550,8 +1535,8 @@ }, { "type": "arrow", - "version": 129, - "versionNonce": 1729968997, + "version": 622, + "versionNonce": 652833568, "index": "as", "isDeleted": false, "id": "NKl5DUMgqWBZXuYbbht9J", @@ -1561,12 +1546,12 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 966.5729283520338, - "y": -102.25829718505645, + "x": 1550.1853897251194, + "y": 154.63094863301117, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", - "width": 0, - "height": 43.32755632582324, + "width": 1.1368683772161603e-13, + "height": 43.32755632582327, "seed": 2073112614, "groupIds": [ "EAGsqV8poBWw0UWL68csZ" @@ -1576,14 +1561,13 @@ "type": 2 }, "boundElements": [], - "updated": 1754948583239, + "updated": 1758744681405, "link": null, "locked": false, "startBinding": { "elementId": "YDN2CFYJD-5V9zKvieZpl", - "focus": -0.046979865771813255, - "gap": 6.932409012131643, - "fixedPoint": null + "focus": -0.04697986577181222, + "gap": 6.932409012131615 }, "endBinding": null, "lastCommittedPoint": null, @@ -1595,16 +1579,16 @@ 0 ], [ - 0, - -43.32755632582324 + 1.1368683772161603e-13, + -43.32755632582327 ] ], "elbowed": false }, { "type": "arrow", - "version": 229, - "versionNonce": 1896515269, + "version": 722, + "versionNonce": 1804472096, "index": "at", "isDeleted": false, "id": "YOuKvNLiB_LzXQXZW7Wxo", @@ -1614,11 +1598,11 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 967.9074170868691, - "y": -14.603184533410046, + "x": 1551.5198784599547, + "y": 242.28606128465753, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", - "width": 0, + "width": 1.1368683772161603e-13, "height": 47.52686308492207, "seed": 628477242, "groupIds": [ @@ -1629,14 +1613,13 @@ "type": 2 }, "boundElements": [], - "updated": 1754948583239, + "updated": 1758744681406, "link": null, "locked": false, "startBinding": { "elementId": "YDN2CFYJD-5V9zKvieZpl", - "focus": 0.03664429530201499, - "gap": 1, - "fixedPoint": null + "focus": 0.036644295302015725, + "gap": 1 }, "endBinding": null, "lastCommittedPoint": null, @@ -1648,7 +1631,7 @@ 0 ], [ - 0, + 1.1368683772161603e-13, 47.52686308492207 ] ], @@ -1656,8 +1639,8 @@ }, { "type": "arrow", - "version": 146, - "versionNonce": 1623335883, + "version": 386, + "versionNonce": 1759604448, "index": "au", "isDeleted": false, "id": "hu56RFp_O0eSyAIj111ZG", @@ -1667,8 +1650,8 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 718.8980966715407, - "y": 446.6884033517412, + "x": 1302.5105580446261, + "y": 703.5776491698089, "strokeColor": "#1971c2", "backgroundColor": "#ffffff", "width": 4.431314623338267, @@ -1680,7 +1663,7 @@ "type": 2 }, "boundElements": [], - "updated": 1754949671404, + "updated": 1758744681398, "link": null, "locked": false, "startBinding": null, @@ -1703,8 +1686,8 @@ { "id": "DeszCljnW3zprs70KW2Bb", "type": "rectangle", - "x": 1308.5812732917261, - "y": 89.94928560621395, + "x": 1892.1937346648117, + "y": 346.83853142428154, "width": 461.18404569269643, "height": 653, "angle": 0, @@ -1720,8 +1703,8 @@ "index": "av", "roundness": null, "seed": 1874561419, - "version": 272, - "versionNonce": 141093125, + "version": 512, + "versionNonce": 1821414112, "isDeleted": false, "boundElements": [ { @@ -1729,16 +1712,16 @@ "id": "Evygu83Z" } ], - "updated": 1754949758280, + "updated": 1758744681398, "link": null, "locked": false }, { "id": "Evygu83Z", "type": "text", - "x": 1313.5812732917261, - "y": 95.14928560621394, - "width": 447.685791015625, + "x": 1897.1937346648117, + "y": 352.03853142428153, + "width": 447.63787841796875, "height": 642.6, "angle": 0, "strokeColor": "#1e1e1e", @@ -1753,11 +1736,11 @@ "index": "avV", "roundness": null, "seed": 1051274379, - "version": 164, - "versionNonce": 1173542341, + "version": 404, + "versionNonce": 1991980768, "isDeleted": false, - "boundElements": null, - "updated": 1754949757948, + "boundElements": [], + "updated": 1758744681398, "link": null, "locked": false, "text": "Normal users interact only with the\nmain tables. They have the same\nnames and interfaces as the\nexisting tables, and two extra\ncolumns.\n\n1 row ⬅➡ 1 collection item. No \nhistory, latest value only.\n\nTriggers on tables record actions in\nhistory tables. This denormalizes\nthe data (1 record duplicated for\nevery collection item).\n\nReferential integrity is\nautomatically enforced because the\nmain tables retain FK relationships.", @@ -1774,8 +1757,8 @@ { "id": "ZnkYWWcODhwf4txo7ad8p", "type": "rectangle", - "x": 1311.9295803250047, - "y": -454.97260799773994, + "x": 1895.5420416980903, + "y": -198.08336217967235, "width": 461.18404569269643, "height": 200, "angle": 0, @@ -1791,8 +1774,8 @@ "index": "avX", "roundness": null, "seed": 232379205, - "version": 481, - "versionNonce": 1023886149, + "version": 721, + "versionNonce": 737237728, "isDeleted": false, "boundElements": [ { @@ -1800,15 +1783,15 @@ "id": "F1yDIOn9" } ], - "updated": 1754949556344, + "updated": 1758744681398, "link": null, "locked": false }, { "id": "F1yDIOn9", "type": "text", - "x": 1316.9295803250047, - "y": -430.57260799773996, + "x": 1900.5420416980903, + "y": -173.68336217967237, "width": 419.66583251953125, "height": 151.20000000000002, "angle": 0, @@ -1824,11 +1807,11 @@ "index": "avY", "roundness": null, "seed": 1016160261, - "version": 253, - "versionNonce": 1628712485, + "version": 493, + "versionNonce": 516095712, "isDeleted": false, - "boundElements": null, - "updated": 1754949585943, + "boundElements": [], + "updated": 1758744681398, "link": null, "locked": false, "text": "All users interact with the tables.\n\n1 row ⬅➡ 1 collection item. No\nhistory, latest value only.", @@ -1845,8 +1828,8 @@ { "id": "P1OrraxFxBntBybZIMhag", "type": "rectangle", - "x": 1307.4951183471896, - "y": 961.837993913957, + "x": 1891.1075797202752, + "y": 1218.7272397320246, "width": 457.56553802218565, "height": 956, "angle": 0, @@ -1862,8 +1845,8 @@ "index": "avd", "roundness": null, "seed": 1863617931, - "version": 583, - "versionNonce": 656808523, + "version": 823, + "versionNonce": 264250080, "isDeleted": false, "boundElements": [ { @@ -1871,16 +1854,16 @@ "id": "T1mKftlv" } ], - "updated": 1754949771748, + "updated": 1758744681398, "link": null, "locked": false }, { "id": "T1mKftlv", "type": "text", - "x": 1312.4951183471896, - "y": 967.337993913957, - "width": 444.98974609375, + "x": 1896.1075797202752, + "y": 1224.2272397320246, + "width": 443.94183349609375, "height": 945.0000000000001, "angle": 0, "strokeColor": "#1e1e1e", @@ -1895,11 +1878,11 @@ "index": "avh", "roundness": null, "seed": 2124587819, - "version": 242, - "versionNonce": 533442507, + "version": 482, + "versionNonce": 762889952, "isDeleted": false, - "boundElements": null, - "updated": 1754949771119, + "boundElements": [], + "updated": 1758744681398, "link": null, "locked": false, "text": "Normal users do not interact\ndirectly with history tables.\n\nHistory tables are append-only.\nEvery past state of every collection\nitem is stored in its history table.\n\n1 collection item ⬅➡ many rows. \n\"Item\" means \"has same collection\nitem id\". There is a separate,\nunique PK, the history id\n(xxx_hx_id), for each row in the\nhistory table.\n\nThe history table and the main\ntable are maintained such that the\nlatest history record for each\n(undeleted) item is the same as the\nmain record for it; deleted items\nare of course not present in\nprimary table.\n\nHistory tables contain additional \ninformation specifically related to \nhistory-tracking.", @@ -1918,7 +1901,7 @@ "theme": "light", "viewBackgroundColor": "#ffffff", "currentItemStrokeColor": "#1e1e1e", - "currentItemBackgroundColor": "#fff9db", + "currentItemBackgroundColor": "#e9ecef", "currentItemFillStyle": "solid", "currentItemStrokeWidth": 0.5, "currentItemStrokeStyle": "solid", @@ -1930,10 +1913,10 @@ "currentItemStartArrowhead": null, "currentItemEndArrowhead": "arrow", "currentItemArrowType": "round", - "scrollX": 900.889407154292, - "scrollY": 2363.9270172941456, + "scrollX": 207.0591079237839, + "scrollY": -88.17503256305879, "zoom": { - "value": 0.451013 + "value": 0.400951 }, "currentItemRoundness": "sharp", "gridSize": 20, @@ -1952,12 +1935,12 @@ }, "objectsSnapModeEnabled": false, "activeTool": { - "type": "selection", + "type": "hand", "customType": null, "locked": false, + "fromSelection": false, "lastActiveTool": null } }, - "prevTextMode": "parsed", "files": {} } \ No newline at end of file