[COST-7249] Phase 3: Migrate monthly/tag/VM cost SQL to RTU pipeline#6043
[COST-7249] Phase 3: Migrate monthly/tag/VM cost SQL to RTU pipeline#6043jordigilh wants to merge 17 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements Phase 3 of the cost model refactor, migrating OpenShift cost calculations (monthly, tag-based, and VM costs) to the new rates_to_usage intermediate table. It includes updates to the database accessors, SQL templates for both Postgres and Trino, and the cost updater logic to handle rate metadata like UUIDs and custom names. Feedback identifies a high-severity issue where the legacy processing path is broken due to missing aggregation steps for the redirected SQL outputs. Other improvements were suggested regarding consistent usage_end date handling in aggregation SQL and ensuring proper fallback logic for rate information lookups in tag-based cost population.
| if rtu_enabled: | ||
| if self._cost_model_id: | ||
| self._update_usage_rates_to_usage(start_date, end_date) | ||
| self._update_monthly_cost(start_date, end_date) | ||
| if self._tag_infra_rates != {} or self._tag_supplementary_rates != {}: | ||
| self._delete_tag_usage_costs(start_date, end_date, self._provider.uuid) | ||
| self._update_tag_usage_costs(start_date, end_date) | ||
| self._update_tag_usage_default_costs(start_date, end_date) | ||
| self._update_monthly_tag_based_cost(start_date, end_date) | ||
| self._update_node_hour_tag_based_cost(start_date, end_date) | ||
| with OCPReportDBAccessor(self._schema) as report_accessor: | ||
| cluster_params = { | ||
| "cluster_id": self._cluster_id, | ||
| "cluster_alias": self._cluster_alias, | ||
| } | ||
| report_accessor.populate_tag_based_costs( | ||
| start_date, | ||
| end_date, | ||
| self._provider.uuid, | ||
| self.metric_to_tag_params_map, | ||
| cluster_params, | ||
| cost_model_id=self._cost_model_id, | ||
| rate_info_map=self._rate_info_map, | ||
| ) | ||
| if not (self._tag_infra_rates or self._tag_supplementary_rates): | ||
| self._delete_tag_usage_costs(start_date, end_date, self._provider.uuid) | ||
| self._aggregate_rates_to_daily_summary(start_date, end_date) | ||
| self._update_vm_usage_costs(start_date, end_date) | ||
| else: | ||
| self._cleanup_stale_rtu_costs(start_date, end_date) | ||
| else: | ||
| self._update_usage_costs(start_date, end_date) | ||
| self._update_monthly_cost(start_date, end_date) | ||
| if self._tag_infra_rates != {} or self._tag_supplementary_rates != {}: | ||
| self._delete_tag_usage_costs(start_date, end_date, self._provider.uuid) | ||
| self._update_tag_usage_costs(start_date, end_date) | ||
| self._update_tag_usage_default_costs(start_date, end_date) | ||
| self._update_monthly_tag_based_cost(start_date, end_date) | ||
| self._update_node_hour_tag_based_cost(start_date, end_date) | ||
| with OCPReportDBAccessor(self._schema) as report_accessor: | ||
| cluster_params = { | ||
| "cluster_id": self._cluster_id, | ||
| "cluster_alias": self._cluster_alias, | ||
| } | ||
| report_accessor.populate_tag_based_costs( | ||
| start_date, | ||
| end_date, | ||
| self._provider.uuid, | ||
| self.metric_to_tag_params_map, | ||
| cluster_params, | ||
| ) | ||
| if not (self._tag_infra_rates or self._tag_supplementary_rates): | ||
| self._delete_tag_usage_costs(start_date, end_date, self._provider.uuid) | ||
| self._update_markup_cost(start_date, end_date) |
There was a problem hiding this comment.
The logic for processing monthly, tag-based, and VM costs is duplicated between the rtu_enabled and legacy branches. More importantly, the legacy path appears to be broken for these cost types. Since the underlying SQL files (e.g., monthly_cost_cluster_and_node.sql, infrastructure_tag_rates.sql) have been modified to target the rates_to_usage table instead of the daily summary table, and the legacy branch (lines 827-850) does not include the _aggregate_rates_to_daily_summary call, these costs will never reach the daily summary reports when the feature flag is OFF. This contradicts the PR description's goal of preserving the legacy path.
There was a problem hiding this comment.
Resolved in commit da8808b. The duplicated flag branches and the broken legacy path were both addressed by making the RTU pipeline unconditional — all cost types now flow through a single code path (RTU INSERT → aggregation → daily summary). The feature flag is still read but emits a warning when OFF; the legacy _update_usage_costs() method is preserved for git-revert rollback. See updated PR description § "Orchestration: RTU pipeline unconditional" for full rationale.
| for value_to_skip in value_names: | ||
| key_value_pair.append(json.dumps({tag_key: value_to_skip})) | ||
| json.dumps(key_value_pair) | ||
| tag_rate_info = rate_info_map.get((metric, cost_type_label, tag_key), {}) |
There was a problem hiding this comment.
In populate_tag_usage_default_costs, the tag_rate_info lookup is missing the fallback to the metric-level rate_info that is correctly implemented in populate_tag_usage_costs (line 1123). This may result in missing custom_name or rate_uuid values if they are defined at the metric level but not specifically for the tag key.
| tag_rate_info = rate_info_map.get((metric, cost_type_label, tag_key), {}) | |
| tag_rate_info = rate_info_map.get((metric, cost_type_label, tag_key), rate_info_map.get((metric, cost_type_label, ""), {})) |
There was a problem hiding this comment.
Good catch. Fixed in commit c39760b — populate_tag_usage_default_costs now falls back to the metric-level rate_info (matching the pattern in populate_tag_usage_costs at line 1150):
rate_info = rate_info_map.get((metric, cost_type_label, ""), {})
# ...
tag_rate_info = rate_info_map.get((metric, cost_type_label, tag_key), rate_info)| rtu.namespace, | ||
| rtu.node, | ||
| rtu.usage_start, | ||
| rtu.usage_start + interval '1 day' AS usage_end, |
There was a problem hiding this comment.
In Step 3, usage_end is hardcoded as rtu.usage_start + interval '1 day'. This is inconsistent with Step 2 (line 127), which uses MIN(rtu.usage_end). Since daily summary rows typically expect usage_start and usage_end to be the same date for daily granularity, and the RTU rows already have usage_end populated from the source data, using MIN(rtu.usage_end) is more accurate and maintains consistency across cost types.
MIN(rtu.usage_end) AS usage_end,There was a problem hiding this comment.
This is a false positive. Both Step 2 (line 87) and Step 3 (line 158) use the same expression: rtu.usage_start + interval '1 day' AS usage_end. Neither uses MIN(rtu.usage_end). The pattern is consistent across both INSERT blocks and matches the daily granularity convention where usage_end = usage_start + 1 day.
FedRAMP Readiness Review — Phase 3 RTU MigrationReviewed the full diff (29 files, 1,256+/1,116−) against NIST 800-53 Rev 5 controls. The migration pattern is solid overall — tenant isolation is preserved, SQL injection surface is reduced by removing Critical: Flag-OFF path silently drops monthly/tag/VM costsProblem. All 25 SQL templates now INSERT into Root cause. Phase 2 kept monthly/tag processing outside the flag check (they still targeted the daily summary directly). This PR moved them inside both branches and changed the SQL target, but only added aggregation to the flag-ON branch. Recommended fix. Preserve the Phase 2 structural pattern: the flag gates only usage-cost routing. Monthly/tag/VM are now fully migrated and should sit outside the flag check, with aggregation always running after them: # Flag gates ONLY usage costs (Phase 2 scope):
if rtu_enabled:
if self._cost_model_id:
self._update_usage_rates_to_usage(...)
else:
self._cleanup_stale_rtu_costs(...)
else:
self._update_usage_costs(...)
# Phase 3: always through RTU (migration complete):
self._update_monthly_cost(...)
# ... tag cost methods ...
self._update_vm_usage_costs(...)
# Aggregation always runs:
self._aggregate_rates_to_daily_summary(...)
self._update_markup_cost(...)This matches the Phase 2 test High: DELETE scope broadened in tag-based cost files
Fix: Restore an equivalent filter on Medium:
|
|
@myersCody — heads-up on one architectural decision in this PR that needs your sign-off before merge. Feature flag is now warning-only (RTU unconditional)Phase 3 converts all 25 monthly/tag/VM SQL files to INSERT into
Question: Are you comfortable with the flag becoming warning-only, with git revert as the rollback mechanism? If you'd prefer to preserve the flag as a kill switch, the alternative is maintaining dual SQL versions for all 25 files — which we assessed as higher risk. The full rationale is documented in the updated risk register (R20, Phase 3 section) in this PR. Everything else in the audit fixes (SHA-256, metric_type centralization, DELETE scope, tests) aligns with Phase 2 patterns and documented options. See the updated PR description for a commit-by-commit breakdown. |
|
/retest |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6043 +/- ##
=======================================
- Coverage 94.4% 94.4% -0.0%
=======================================
Files 362 362
Lines 32101 32140 +39
Branches 3538 3542 +4
=======================================
+ Hits 30293 30327 +34
- Misses 1173 1179 +6
+ Partials 635 634 -1 🚀 New features to boost your workflow:
|
|
/retest |
2 similar comments
|
/retest |
|
/retest |
Yes! So, this is part of the reason I pushed for these to be done in a separate phases is because I knew these files will not be controllable through an unleash flag. The hope is to get phase 2 into production before we merge phase 3. Once phase 2 is in prod with no issues, we can remove the unleash flag. I always planned for the rollback strategy for phase 3 would be to revert the commit. However, I did not foresee you needing another migration, which complicates the revert strategy. I think what we need to do is pull the migration out of this PR and merge it beforehand. We can extend the capacity of the label hash prior to these changes with no impact on the system. That allows us to push in these changes and have a clean revert if we experience problems. |
| cluster_id, | ||
| cluster_alias, | ||
| data_source, | ||
| NULL AS persistentvolumeclaim, |
There was a problem hiding this comment.
Why add the column to the insert if it is always null?
There was a problem hiding this comment.
Valid point — persistentvolumeclaim is always NULL for VM costs since these rows come from data_source = 'Pod'. The column is nullable with a NULL default, and the aggregation JOIN handles NULLs via IS NOT DISTINCT FROM, so omitting it from the INSERT is safe.
Will remove it. The same pattern applies to volume_labels (also always NULL in this file) — will trim that too for consistency. Applies to hourly_vm_core.sql, hourly_cost_vm_tag_based.sql, and hourly_vm_core_tag_based.sql as well.
There was a problem hiding this comment.
Done — removed `persistentvolumeclaim` and `volume_labels` from all four VM hourly SQL files in commit `7794ad9`. The INSERT columns now exactly match the non-NULL values the SELECT produces.
|
Github is down right now and I can't leave comments on the PR lines directly. Sorry for the annoying comments that are going to start rolling in, but I want to get you some feedback ASAP. Feedback: |
What is the benefit of basically recreating the json structure that we removed here? I feel like we have the ability to query the Rate table directly now to see if the rate exists? Additionally how are we handling scenarios where we have more than one price list. Are we taking validity period into consideration here? |
|
Agreed — will split the migration out into a standalone PR. The migration ( One change from the current migration: removing the reverse-only
Will open the migration PR shortly and rebase Phase 3 to exclude it. |
Agreed — the |
The intent was a cached lookup table ( That said, you're right that it partially recreates the structure we removed. An alternative is to query Rate directly per-invocation with the indexes already on the table (
Valid catch — this is a bug. The current Rate.objects.filter(price_list__cost_model_maps__cost_model=self.cost_model)This does not scope to the effective price list for the billing period. All other rate properties ( Will fix by scoping the Rate query to the effective price list when |
82dc125 to
5ea5827
Compare
@jordigilh You bring up a good point about the N+1 query. Another additional benefit of this approach is that it allows you to do the work of figuring out the validity periods of price lists in one place instead of 15 different places. I am sold on keeping it a cache property. |
b9ee8b5 to
e05c012
Compare
Extend label_hash from VARCHAR(32) (MD5) to VARCHAR(64) in preparation for Phase 3's SHA-256 upgrade. This is a backwards- compatible schema change with zero runtime impact on existing MD5 hashes. Split out from Phase 3 PR project-koku#6043 per TL request so that Phase 3 remains a clean git-revert without migration entanglement. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/retest |
b8239cd to
4ef269b
Compare
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Action: Re-trigger the Generated automatically. Review before applying. |
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Evidence: Affected files in diff:
Action: Investigate whether the changes to Generated automatically. Review before applying. |
97ae01d to
82ed053
Compare
Move all remaining cost model SQL files to INSERT into rates_to_usage instead of directly writing to reporting_ocpusagelineitem_daily_summary. This completes the RTU pipeline migration for all cost types. SQL files converted (25 total): - 4 PostgreSQL tag rate files (infra/supp specific + default) - 3 PostgreSQL monthly cost files (node/cluster, PVC, VM) - 2 PostgreSQL tag-based monthly files (node_cost_by_tag, PVC_by_tag) - 8 Trino SQL files (VM hourly/monthly, VM core, project, GPU) - 8 self-hosted SQL files (mirrors of Trino) Accessor/orchestrator changes: - Add rate_info_map property to CostModelDBAccessor for Rate UUID/ custom_name lookup by (metric, cost_type, tag_key) - Update 6 accessor methods to accept and pass cost_model_id, rate_uuid, custom_name to SQL templates - Rewire orchestration: monthly/tag/VM processing now runs inside the RTU-enabled block, with legacy path preserved for flag-off Aggregation expansion: - Step 1 DELETE now covers all cost model rows (usage + monthly + tag) - Step 2 INSERT: usage costs (monthly_cost_type IS NULL) with base JOIN - Step 3 INSERT: monthly/tag costs (monthly_cost_type IS NOT NULL) without base JOIN, including GPU cost routing Co-authored-by: Cursor <cursoragent@cursor.com>
Phase 3 converts all monthly/tag/VM SQL to INSERT into rates_to_usage. The legacy direct-write path is no longer viable because the aggregation DELETE would wipe those rows. The COST_BREAKDOWN_RTU_UNLEASH_FLAG is still read; when OFF, a warning is logged but the RTU path runs for all cost types to prevent silent data loss. Co-authored-by: Cursor <cursoragent@cursor.com>
to SHA-256, restore tag DELETE scope
- Add _get_routing_metric_type() helper to OCPReportDBAccessor to derive
the aggregation bucket (cpu/memory/storage/gpu) from cost parameters.
Replace hardcoded metric_type in 25 SQL files with {{metric_type}}.
- Replace md5() with sha256() across all PostgreSQL, Trino, and
self-hosted SQL files. Widen label_hash column from VARCHAR(32) to
VARCHAR(64) via migration 0350.
- Restore tag_key scoping on DELETE in node_cost_by_tag.sql and
monthly_cost_persistentvolumeclaim_by_tag.sql to prevent over-deletion.
Co-authored-by: Cursor <cursoragent@cursor.com>
…HA-256 - R20: document Phase 3 realization — RTU unconditional, feature flag is warning-only, rollback via git revert (not flag toggle). - R13: record md5 → sha256 upgrade, document built-in PG sha256(), migration 0350, no data migration needed. - Update risk matrix and changelog (v1.3). Co-authored-by: Cursor <cursoragent@cursor.com>
…outing tests - TC-R20-01: update expected order to rtu → monthly → vm → agg → markup → dist. - TC-R20-03: assert monthly/vm/agg/markup called even with no cost model. - TC-55: assert RTU called when flag is OFF (flag override behavior). - Add TestRoutingMetricType with 12 unit tests covering all aggregation buckets (cpu, memory, storage, gpu) for _get_routing_metric_type(). Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid recomputing rate metadata lookup on every access. Co-authored-by: Cursor <cursoragent@cursor.com>
…o fallback - Run black on cost_model_db_accessor, ocp_report_db_accessor, and ocp_cost_model_cost_updater to fix CI Sanity pre-commit failures. - Fix populate_tag_usage_default_costs to fall back to metric-level rate_info when tag-key-specific entry is missing (consistent with populate_tag_usage_costs). Co-authored-by: Cursor <cursoragent@cursor.com>
Re-run formatting with the project-pinned black 22.12.0 (--line-length 119) to match CI Sanity pre-commit checks. Co-authored-by: Cursor <cursoragent@cursor.com>
Phase 3 SQL now writes tag-based costs to rates_to_usage instead of daily_summary. Update test assertions to verify RatesToUsage rows and update mock expectations for new kwargs (cost_model_id, rate_info_map, source_uuid, report_period_id). Co-authored-by: Cursor <cursoragent@cursor.com>
… issues
- Fix NULL cluster_id in node_cost_by_tag.sql allocated pod INSERT
(was hardcoded NULL instead of selecting cluster_id from temp table)
- Set cost_model=None and rate_info_map={} on CostModelDBAccessor
mocks to prevent MagicMock objects from reaching SQL queries
- Update test assertions to check RatesToUsage instead of
daily_summary for monthly/tag-based cost verification
Co-authored-by: Cursor <cursoragent@cursor.com>
…tion Add a reverse-only TRUNCATE on rates_to_usage so that narrowing label_hash back to varchar(32) does not fail when 64-char SHA-256 hashes exist from earlier tests in the same session. Co-authored-by: Cursor <cursoragent@cursor.com>
Phase 3 makes monthly/vm/agg/markup unconditional even when no effective price list exists. Update test_cleanup_when_no_effective_price_list to expect those calls, and broaden test_rtu_custom_name_fallback to accept monthly/node/cluster cost_type identifiers in addition to usage metrics. Co-authored-by: Cursor <cursoragent@cursor.com>
23b44ec to
325b7e8
Compare
koku-ci-triager-bot
left a comment
There was a problem hiding this comment.
🤖 CI Triager Suggestion
Check: Sanity
Root cause: black reformatted koku/masu/test/processor/ocp/test_phase2_rates_to_usage.py because the multi-line set union expression doesn't follow black's style. Run pre-commit run --all-files locally to auto-fix.
Accept the suggestion below with one click.
| known_identifiers = set(metric_constants.COST_MODEL_USAGE_RATES) | set( | ||
| metric_constants.COST_MODEL_MONTHLY_RATES | ||
| ) | set(metric_constants.COST_MODEL_VM_USAGE_RATES) | set( | ||
| metric_constants.COST_MODEL_NODE_RATES | ||
| ) | { | ||
| "Node", "Cluster", "PVC", "OCP_VM", "Node_Core_Month", "Node_Core_Hour", | ||
| } |
There was a problem hiding this comment.
Apply black formatting for the multi-line set union expression.
| known_identifiers = set(metric_constants.COST_MODEL_USAGE_RATES) | set( | |
| metric_constants.COST_MODEL_MONTHLY_RATES | |
| ) | set(metric_constants.COST_MODEL_VM_USAGE_RATES) | set( | |
| metric_constants.COST_MODEL_NODE_RATES | |
| ) | { | |
| "Node", "Cluster", "PVC", "OCP_VM", "Node_Core_Month", "Node_Core_Hour", | |
| } | |
| known_identifiers = ( | |
| set(metric_constants.COST_MODEL_USAGE_RATES) | |
| | set(metric_constants.COST_MODEL_MONTHLY_RATES) | |
| | set(metric_constants.COST_MODEL_VM_USAGE_RATES) | |
| | set(metric_constants.COST_MODEL_NODE_RATES) | |
| | { | |
| "Node", | |
| "Cluster", | |
| "PVC", | |
| "OCP_VM", | |
| "Node_Core_Month", | |
| "Node_Core_Hour", | |
| } | |
| ) |
Co-authored-by: Cursor <cursoragent@cursor.com>
koku-ci-triager-bot
left a comment
There was a problem hiding this comment.
🤖 CI Triager Suggestion
Check: Sanity
Root cause: Two pre-commit hooks failed after the latest commit (2026-05-20T20:38:19Z):
reorder-python-imports— 5 files have a blank line added between the module docstring and the first import; the tool removes it.black—test_common.pyhas an inline class body (class Foo: ...) that black 22.12.0 expands to multi-line;ocp_cost_model_cost_updater.pywraps a ternary in redundant parentheses that black removes.
Accept each suggestion below with one click.
|
|
||
| class ManifestFactory(ModelFactory[utils.Manifest]): | ||
| ... | ||
| class ManifestFactory(ModelFactory[utils.Manifest]): ... |
There was a problem hiding this comment.
black reformats inline class bodies to multi-line form.
| class ManifestFactory(ModelFactory[utils.Manifest]): ... | |
| class ManifestFactory(ModelFactory[utils.Manifest]): | |
| ... |
| log_json( | ||
| msg="RTU pipeline path selected", | ||
| rtu_enabled=rtu_enabled, | ||
| cost_model_id=(str(self._cost_model_id) if self._cost_model_id else None), |
There was a problem hiding this comment.
black removes redundant parentheses around ternary expressions in keyword arguments.
| cost_model_id=(str(self._cost_model_id) if self._cost_model_id else None), | |
| cost_model_id=str(self._cost_model_id) if self._cost_model_id else None, |
| """Gunicorn configuration file.""" | ||
|
|
There was a problem hiding this comment.
reorder-python-imports removes the blank line between the module docstring and the first import.
| """Gunicorn configuration file.""" | |
| """Gunicorn configuration file.""" |
| """Tests for the PriceListManager.""" | ||
|
|
There was a problem hiding this comment.
reorder-python-imports removes the blank line between the module docstring and the first import.
| """Tests for the PriceListManager.""" | |
| """Tests for the PriceListManager.""" |
| """Tests for the PriceList and PriceListCostModelMap models.""" | ||
|
|
There was a problem hiding this comment.
reorder-python-imports removes the blank line between the module docstring and the first import.
| """Tests for the PriceList and PriceListCostModelMap models.""" | |
| """Tests for the PriceList and PriceListCostModelMap models.""" |
| """Masu OCP post processor module tests.""" | ||
|
|
There was a problem hiding this comment.
reorder-python-imports removes the blank line between the module docstring and the first import.
| """Masu OCP post processor module tests.""" | |
| """Masu OCP post processor module tests.""" |
| """ | ||
|
|
There was a problem hiding this comment.
reorder-python-imports removes the blank line between the module docstring and the first import.
| """ | |
| """ |
Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Evidence: Pattern repeated across all test attempts (~23:35 → 02:35); all OCP ingest tests errored. Action:
Generated automatically. Review before acting. |
…UMENT Trino's PostgreSQL connector exposes jsonb columns as json type, and CAST(json AS varchar) is not valid in Trino. Replace with json_format(CAST(x AS json)) which correctly serializes json to varchar for the sha256 label_hash computation. COST-7249 Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Evidence: Action: The pipeline will eventually timeout (~4 more hours). Two possible causes:
Generated automatically. Review before acting. |
|
/retest |
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Evidence: Action: This is a Koku-side regression introduced by this PR. AI and MIG workloads trigger a code path that the new unconditional RTU pipeline (Phase 3) or the new GPU/VM SQL does not complete correctly. Investigate:
This is the third consecutive koku-ci run failing with the same pattern on this PR. Generated automatically. Review before acting. |
monthly_cost_gpu.sql was missed in the prior INVALID_CAST_ARGUMENT fix. Trino rejects CAST(json AS varchar); use json_format() instead. This is the code path triggered by AI and MIG workload IQE tests (test_api_ocp_source_ai_workloads_ingest, test_api_ocp_source_mig_workloads_ingest). Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Evidence: All 23 failures are in Affected files:
Action: Verify that Generated automatically. Review before applying. |
Summary
Completes the RTU (RatesToUsage) pipeline migration for all remaining cost types — monthly, tag-based, and VM costs — building on the Phase 2 foundation merged in #6017.
After the initial SQL migration, a comprehensive readiness audit (FedRAMP, QE, SRE, Security, Architecture, DBA) identified several issues. This PR includes the original migration plus all audit fixes.
What changed (by commit)
1. Original Phase 3 SQL migration (
2fec011)rates_to_usageinstead of direct-writing toreporting_ocpusagelineitem_daily_summary:ocp_report_db_accessor.pyto passcost_model_id,rate_uuid, andcustom_nameto SQL templates via newrate_info_mapproperty onCostModelDBAccessor2. Orchestration: RTU pipeline unconditional (
da8808b)rates_to_usage, making the legacy direct-write path non-functional for these cost types.COST_BREAKDOWN_RTU_UNLEASH_FLAGis still read; when OFF, a warning log is emitted but the RTU path runs anyway. This prevents silent data loss — if usage costs went through legacy direct-write while aggregation runs (triggered by Phase 3 costs in RTU), the aggregation DELETE would wipe the legacy rows._update_usage_costs()method is preserved (not called) for git-revert rollback.3. SQL audit fixes (
97767b7)metric_typecentralized: Added_get_routing_metric_type()static helper toOCPReportDBAccessor. Replaces hardcoded'cpu','storage','gpu'andCASEexpressions in 25 SQL files with{{metric_type}}template variable. The routing logic mirrors Phase 2's SQL-level decisions (R13 IQ-2):'cpu'if distribution is cpu, else'memory''storage''gpu'usage_typemd5()withsha256()across all PostgreSQL, Trino, and self-hosted SQL files.sha256(bytea)is built-in PostgreSQL 11+ (proven bykoku/db_functions/jsonb_sha256_text.sql, in production since 2021). Nopgcryptoneeded.0350widenslabel_hashfromVARCHAR(32)toVARCHAR(64)AND rtu.pod_labels ? {{tag_key}}/AND rtu.volume_labels ? {{tag_key}}to the DELETE statements innode_cost_by_tag.sqlandmonthly_cost_persistentvolumeclaim_by_tag.sqlto prevent over-deletion across tag keys4. Risk register update (
bc59356)5. Tests (
891753e)rtu → monthly → vm → agg → markup → distcost_model_idis None_get_routing_metric_type()6. Cleanup (
b726ce0)rate_info_maponCostModelDBAccessorcached with@cached_propertyKey design decisions
metric_typehelperAlignment with Phase 2
_make_orchestration_patchesTest plan
koku-cipipeline)metric_typeintocost_model_cpu_cost,cost_model_memory_cost,cost_model_volume_cost,cost_model_gpu_costlabel_hashis 64-char hex (sha256) inrates_to_usage