Skip to content

[COST-7401] Wasted Cost Fix#6029

Open
myersCody wants to merge 23 commits into
mainfrom
wasted_cost_solutions
Open

[COST-7401] Wasted Cost Fix#6029
myersCody wants to merge 23 commits into
mainfrom
wasted_cost_solutions

Conversation

@myersCody
Copy link
Copy Markdown
Contributor

Jira Ticket

COST-####

Description

This change will ...

Testing

  1. Checkout Branch
  2. Restart Koku
  3. Hit endpoint or launch shell
    1. You should see ...
  4. Do more things...

Release Notes

  • proposed release note
* [COST-####](https://issues.redhat.com/browse/COST-####) Fix some things

@myersCody myersCody requested review from a team as code owners April 24, 2026 19:45
@github-actions github-actions Bot added the ok-to-skip-smokes Changes are not hit by smoke tests (doc updates for example) label Apr 24, 2026
@myersCody myersCody marked this pull request as draft April 24, 2026 19:46
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new architectural document, solution-options-and-limitations.md, which analyzes limitations in efficiency score calculations like aggregation bias and cost double-counting. It also updates the README and changelog. Feedback suggests targeting summary tables like OCPPodSummaryP for proposed fixes to align with reporting logic and removing tool-specific IDE references to keep documentation tool-agnostic.

Comment thread docs/architecture/efficiency-scores/solution-options-and-limitations.md Outdated
Comment thread docs/architecture/efficiency-scores/solution-options-and-limitations.md Outdated
@myersCody myersCody marked this pull request as ready for review April 29, 2026 13:20
@myersCody myersCody force-pushed the wasted_cost_solutions branch from ed0a583 to 4536352 Compare May 5, 2026 19:58
@myersCody myersCody removed the ok-to-skip-smokes Changes are not hit by smoke tests (doc updates for example) label May 5, 2026
@github-actions github-actions Bot added the smokes-required Label to show that smokes tests should be run against these changes. label May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.3%. Comparing base (a8beba3) to head (31ae100).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #6029   +/-   ##
=====================================
  Coverage   94.3%   94.3%           
=====================================
  Files        362     362           
  Lines      32029   32028    -1     
  Branches    3522    3524    +2     
=====================================
  Hits       30218   30218           
  Misses      1174    1174           
+ Partials     637     636    -1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@myersCody myersCody changed the title [COST-7401] Potential wasted cost solutions [COST-7401] Wasted Cost Fix May 6, 2026
@myersCody myersCody marked this pull request as draft May 11, 2026 13:52
@myersCody myersCody added the ocp-smoke-tests pr_check will run ocp + ocp on cloud smoke tests, used when changes affect ocp. label May 15, 2026
@myersCody
Copy link
Copy Markdown
Contributor Author

/retest

@koku-ci-triager-bot
Copy link
Copy Markdown
Collaborator

🤖 CI Triager — Diagnosis

Check: Red Hat Konflux / koku-ci / koku
PipelineRun: koku-ci-2x9lh
Root cause: A Koku-side regression caused by this PR. The IQE test test_api_ocp_compute_unsupported_group_bys_return_empty_score checks that multi-dimensional group_by returns an empty total_score aggregate, but this PR's wasted_cost addition now populates that field even in the unsupported multi-dimensional case.
Evidence:

FAILED tests/rest_api/v1/test__ocp_compute_reports.py::test_api_ocp_compute_unsupported_group_bys_return_empty_score
AssertionError: Multi-dimensional group_by should return empty total_score, got
  {'wasted_cost': {'value': 0.0, 'units': 'USD'}}

assert {'wasted_cost': {'units': 'USD', 'value': 0.0}} == {}

Left contains 1 more item:
  {'wasted_cost': {'units': 'USD', 'value': 0.0}}

The test expects total_score to be an empty dict {} when unsupported group_bys are used, but the new wasted_cost field is being returned.

Action: The OCP compute reports handler should not include wasted_cost in the total_score when an unsupported multi-dimensional group_by combination is used, or alternatively the IQE test in iqe-cost-management-plugin needs to be updated to expect wasted_cost in that response. Since the IQE test lives in a private repo, coordinate with the QE team to update test_api_ocp_compute_unsupported_group_bys_return_empty_score to account for the new field, or fix the Koku API to return an empty total_score for this unsupported group_by case.

Generated automatically. Review before applying.

@myersCody myersCody marked this pull request as ready for review May 19, 2026 13:49
@myersCody myersCody enabled auto-merge (squash) May 19, 2026 13:50
@koku-ci-triager-bot
Copy link
Copy Markdown
Collaborator

🤖 CI Triager — Diagnosis

Check: Red Hat Konflux / koku-ci / koku
PipelineRun: koku-ci-m6648
Root cause: IQE infrastructure failure — the RBAC service returned HTTP 500 during test environment setup (creating user permission groups), before any IQE tests ran. This is unrelated to the code changes in this PR.
Evidence:

iqe_rbac_api.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: {..., 'Date': 'Tue, 19 May 2026 14:04:26 GMT', ...}
HTTP response body: <html><h1>Server Error (500)</h1></html>

in: _add_roles_to_group → rbac_client.group_api.add_role_to_group(group.uuid, ...)
Some jobs failed: {'koku-iqe-kxuzgso': 'Failed'}

Action: Re-trigger the Konflux CI pipeline (e.g., via /retest comment or push a new commit). The RBAC 500 error is a pre-existing infra issue unrelated to this PR's wasted_cost changes.

Generated automatically. Review before applying.

Copy link
Copy Markdown
Collaborator

@koku-ci-triager-bot koku-ci-triager-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 CI Triager Suggestion

Check: Units - 3.11
Root cause: Two bugs in query_handler.py introduced by this PR cause three test failures:

  1. _format_query_response() unconditionally adds total_score: {} to all OCP reports — the tests require it to be absent from cost and volume reports.
  2. The should_compute=False branch (multi-group-by) pops wasted_cost from query_sum, but test_efficiency_score_multi_group_by_omits_efficiency_percent expects wasted_cost to still appear in total_score (it is a plain SUM, valid regardless of group-by cardinality).

Accept each suggestion below with one click.

self.query_sum = self._pack_data_object(self.query_sum, **self._mapper.PACK_DEFINITIONS)
if "score" in self.query_sum:
self.query_sum["total_score"] = self.query_sum.pop("score")
self.query_sum["total_score"] = self.query_sum.pop("score", {})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_score must only be injected for compute (cpu/memory) reports. Cost and volume reports have no efficiency score, so this key should be absent entirely.

Suggested change
self.query_sum["total_score"] = self.query_sum.pop("score", {})
if self._report_type in ("cpu", "memory"):
self.query_sum["total_score"] = self.query_sum.pop("score", {})
else:
self.query_sum.pop("score", None)

Comment on lines +299 to +302
else:
query_sum.pop("usage_efficiency", None)
query_sum.pop("wasted_cost", None)
query_sum.pop("wasted_cost_units", None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasted_cost is a simple aggregate SUM and is meaningful even when multiple group-by dimensions are used. Only usage_efficiency_percent (a ratio-of-sums) should be suppressed for multi-group-by. Keep wasted_cost and its units so the packing step can include it in total_score.

Suggested change
else:
query_sum.pop("usage_efficiency", None)
query_sum.pop("wasted_cost", None)
query_sum.pop("wasted_cost_units", None)
else:
query_sum.pop("usage_efficiency", None)
query_sum["wasted_cost_units"] = self.currency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ocp-smoke-tests pr_check will run ocp + ocp on cloud smoke tests, used when changes affect ocp. smokes-required Label to show that smokes tests should be run against these changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants