Skip to content

Migrate Opower data reads from REST to GraphQL#177

Open
loganrosen wants to merge 5 commits into
tronikos:mainfrom
loganrosen:graphql-bill-costs
Open

Migrate Opower data reads from REST to GraphQL#177
loganrosen wants to merge 5 commits into
tronikos:mainfrom
loganrosen:graphql-bill-costs

Conversation

@loganrosen
Copy link
Copy Markdown
Contributor

@loganrosen loganrosen commented Mar 1, 2026

Summary

Migrates Opower data reads from REST to GraphQL for:

  • Bill-level cost/usage reads
  • Interval usage/cost reads (day/hour/half-hour/quarter-hour)
  • Realtime usage reads

REST fallback for bill reads has been removed.

Key implementation changes

src/opower/opower.py

  • Added GraphQL bill-cost path via _async_get_bill_cost_reads()
  • Added GraphQL interval path via _async_discover_service_point() + _async_get_graphql_interval_reads()
  • Added client-side interval aggregation (_aggregate_interval_reads())
  • Interval reads pipeline uses CostRead internally, extracting monetaryAmount from GraphQL when available (defaults to 0 when null)
  • Updated public APIs:
    • async_get_cost_reads() — returns CostRead for both bill and sub-bill aggregate types
    • async_get_usage_reads() — returns UsageRead (converts from internal CostRead)
    • async_get_realtime_usage_reads() — returns UsageRead
  • Removed obsolete REST read paths for interval/realtime data

src/opower/__main__.py

  • CLI output includes usage_charges and current_amount for bill-level reads

tests/test_opower.py

  • Added GraphQL fixture tests for bill parsing/filtering
  • Added interval discovery/fetch/cache tests
  • Added batching/discovery-failure/non-bill-cost tests
  • Added timezone bucketing regression test
  • Added monetaryAmount summation test for interval cost reads

Validation

  • Unit tests: 41 passed
  • pre-commit: clean on changed files
  • Live ConEd smoke tests passed for day, hour, and bill
  • Live-tested monetaryAmount field against ConEd — accepted by server but returns null (ConEd does not populate sub-bill cost data)

Known limitations

  • Gas interval filter values are currently best-effort (units: [CCF], serviceQuantityIdentifier: [DELIVERED]).
  • Electric/ConEd path is live-verified; gas GraphQL interval variants have not yet been live-validated.
  • monetaryAmount is null for ConEd interval reads. Other utilities may populate it — we carry it through when available.

Part of #176

@loganrosen loganrosen marked this pull request as draft March 1, 2026 23:02
@loganrosen loganrosen changed the title Add GraphQL bill-level cost data (Phase 1 of REST→GraphQL migration) Migrate Opower data reads from REST to GraphQL Mar 2, 2026
loganrosen and others added 2 commits March 1, 2026 22:19
Many utilities (ConEd, PSE, SCL, etc.) return providedCost=0 from
the REST DataBrowser API. The GraphQL API's bills query returns
actual cost data including usageCharges (energy only) and
currentAmount (total bill incl. delivery + taxes).

Changes:
- Add usage_charges and current_amount fields to CostRead dataclass
- Add _async_get_bill_cost_reads() using GraphQL bills query
- For AggregateType.BILL, try GraphQL first, fall back to REST
- Add variables support to _async_post_graphql()
- Update CLI tool to display new cost fields

Ref: tronikos#176

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all REST-based data fetching (DataBrowser-v1, real-time-ami-v1)
with Opower's GraphQL API for interval reads.

Key changes:
- Add _async_discover_service_point() for GraphQL service point/register
  discovery with per-account caching
- Add _async_get_graphql_interval_reads() with 24-hour batching
  (API enforces max 24h per request)
- Add _parse_interval_reads_response() for response parsing
- Add _aggregate_interval_reads() for client-side aggregation
  (DAY/HOUR/HALF_HOUR from raw quarter-hour meter data)
- Rewrite async_get_cost_reads() to use GraphQL for all aggregate types
- Rewrite async_get_usage_reads() to use GraphQL
- Rewrite async_get_realtime_usage_reads() with 2-day GraphQL lookback
- Remove dead REST code: _async_get_dated_data, _async_fetch,
  _async_get_meters, self.meters

GraphQL API requirements discovered during testing:
- onlyUnverifiedStreams: true is required on intervalReads field
- serviceQuantityIdentifier is required (ELEC: NET_USAGE, GAS: DELIVERED)
- Time intervals must be UTC format (yyyy-mm-ddThh:mm:ssZ)
- No matching filter on serviceAgreementsConnection (returns empty)
- singlePremise parameter needed on billingAccountByAuthContext

Bill-level costs (from _async_get_bill_cost_reads) provide real cost data.
Sub-bill intervals only have usage data; cost derivation from bill rates
would be a separate enhancement.

Tested live with ConEd: day, hour, and bill aggregation all working.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@loganrosen loganrosen force-pushed the graphql-bill-costs branch from a668e64 to 7a19965 Compare March 2, 2026 03:21
@loganrosen loganrosen marked this pull request as ready for review March 2, 2026 03:21
@loganrosen loganrosen force-pushed the graphql-bill-costs branch from 5731eeb to bad1d86 Compare March 2, 2026 03:37
Add monetaryAmount { value currency } to interval reads GraphQL query.
Change internal pipeline to use CostRead (instead of UsageRead) for
interval reads, carrying provided_cost through parsing and aggregation.

For utilities that populate monetaryAmount on ServiceQuantityRead (not
ConEd currently), cost data now flows through to async_get_cost_reads
at sub-bill resolution. Public async_get_usage_reads still returns
UsageRead by converting at the boundary.

Add test for monetaryAmount summation in cost reads.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@loganrosen loganrosen force-pushed the graphql-bill-costs branch from bad1d86 to 094d3fb Compare March 2, 2026 03:39
@tronikos
Copy link
Copy Markdown
Owner

I haven't looked at the code. I just ran it for my utility PG&E and have several issues:

  • I got empty usage_charges and current_amount in the output.
  • The provided_cost is always 0 for me which is not the case when calling the REST API.
  • The consumption gives weird values. With aggregation set to day the REST API returns -10.1933 (I have solar and I'm returning to the grid) while the new API returns 6.9. With aggregation set to hour the consumption seem to match during the night (e.g. 0.5545 vs 0.554, note the less precision from the new API) but not during the day when I return to the grid. REST API has negative values while the new API has 0.
  • When the daylight saving time began on March 8 I get no results from 4am to 6pm, that's a gap of 14 hours. With the REST API I have no gap.

Address PR feedback from tronikos testing with PG&E:

- Include interval reads with null measuredAmount as consumption=0
  instead of skipping them (fixes data gaps including DST transitions)
- Use local time with offset for interval query batches instead of UTC,
  matching the bill query format (fixes DST-related server issues)
- Bill provided_cost falls back to usageCharges when currentAmount is
  null (fixes provided_cost always showing 0 for some utilities)
- usage_charges and current_amount fields are now None (not 0) when the
  GraphQL response doesn't include them
- CLI only shows usage_charges/current_amount columns for bill
  aggregation (they're always None for interval reads)
- Extract CLI output helpers to reduce _main() complexity
- Added tests for null measuredAmount handling and bill cost fallback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@loganrosen
Copy link
Copy Markdown
Contributor Author

loganrosen commented Mar 29, 2026

Thanks for testing with PG&E, @tronikos! I've pushed fixes for all four issues:

  1. Empty usage_charges/current_amount — these columns are now only shown for bill-level reads (they're always None for interval reads, so showing them was confusing).

  2. provided_cost always 0 — for bill reads, provided_cost now falls back to usageCharges when currentAmount is null. Previously it was hardcoded to currentAmount which was null for some utilities.

  3. Wrong consumption values (no negatives, different daily totals) — the parser was skipping interval reads where measuredAmount was null instead of including them as 0. This was dropping data points. They're now included with consumption=0. (Note: if PG&E's GraphQL endpoint genuinely doesn't return negative values for solar net metering, that would be a server-side difference from the REST API — I can't fix that on the client side. Worth checking if the raw GraphQL response contains negatives.)

  4. 14-hour DST gap — two fixes here: (a) null reads are no longer dropped (see point 3 above), and (b) interval query batches now use local time with UTC offset instead of converting to UTC, matching the format used by the bill query. This avoids ambiguity at DST boundaries.

I've verified all three aggregation types (bill/day/hour) against ConEd, including across the March 8 DST transition — no gaps.

Would be great if you could re-test with PG&E to confirm these fix the issues on your end!

ConEd's GraphQL endpoint requires UTC dates with Z suffix
(yyyy-mm-ddThh:mm:ssZ) for the timeInterval parameter. The previous
isoformat() call produced dates with timezone offsets which ConEd
rejected with HTTP 400: 'The provided asOf date format is not in the
format of yyyy-mm-ddThh:mm:ssZ'.

Additionally, batch in UTC rather than local time to avoid DST
ambiguity where a 24-hour local window can span 23 or 25 UTC hours
at DST transitions.
@tronikos
Copy link
Copy Markdown
Owner

tronikos commented Apr 30, 2026

I'm curious, what model and coding tool are you using?

I still haven't looked at the code or run the latest version. I asked Gemini 3.1 Pro and Claude Sonnet 4.6 to review the code and they said:

🔴 P1 — Critical / High Impact / Breaking Risks

1. Massive API call volume explosion (~135× increase) [Claude & Gemini]

  • Issue: _async_get_graphql_interval_reads batches requests into strictly 24-hour UTC windows to bypass GraphQL limits.
  • Impact: Fetching 3 years of historical hourly data now requires ~1,095 sequential HTTP requests instead of the ~8 requests required by the old REST batching logic. This will drastically slow down initial data imports and heavily increases the risk of the user being rate-limited, IP-banned, or blocked by utility Web Application Firewalls (WAFs).

2. No fallback for utilities that lack GraphQL support [Claude]

  • Issue: The developer entirely deleted the REST API paths (DataBrowser-v1, etc.) which have been widely supported across Opower utilities for years, replacing them exclusively with the dsm-graphql-v1 endpoint.
  • Impact: If a specific utility has not yet migrated to or exposed this Oracle DSM GraphQL layer, the integration will completely break for those users with no fallback mechanism.

3. usage_only parameter is silently ignored [Claude & Gemini]

  • Issue: async_get_cost_reads retains usage_only: bool = False in its signature but does absolutely nothing with it.
  • Impact: Callers explicitly requesting usage-only data to save bandwidth or bypass cost-endpoint errors will now silently execute heavier cost-inclusive GraphQL queries. This violates the function's contract and breaks backward compatibility for error-handling fallbacks.

4. provided_cost semantics changed without a contract [Claude]

  • [Gemini Disagrees with Claude's Severity/Framing]
    • Claude's view: This is a critical breaking change because provided_cost now returns currentAmount (the full bill with taxes/fees) instead of just the energy usage cost, changing what downstream sensors display.
    • Gemini's view: While Claude is technically correct that the output changes, this is almost certainly an intentional bug fix, not a regression. A major historical complaint with the Opower REST API was that providedCost returned $0.0 or stripped out delivery fees, making the Home Assistant energy dashboard wildly inaccurate compared to the user's actual bill. Upgrading this to currentAmount is highly desirable. This should be treated as a Release Note / Breaking Change announcement for users, not a P1 code defect to be "fixed".

🟠 P2 — Significant Logic Bugs & Behavioral Regressions

5. QUARTER_HOUR aggregation assumes native resolution [Gemini]

  • Issue: _aggregate_interval_reads returns raw reads unchanged when aggregate_type == AggregateType.QUARTER_HOUR, assuming the meter's underlying data is exactly 15 minutes.
  • Impact: If a utility provides 5-minute, 1-minute, or 60-minute native data, callers requesting quarter-hour aggregation will receive the wrong granularity entirely. It needs to be mathematically bucketed using modulo math (e.g., minute=15 * (minute // 15)), just like HALF_HOUR is.

6. Naive datetime skips timezone conversion [Gemini]

  • Issue: _aggregate_interval_reads only applies local timezone conversions if start_time.tzinfo is not None. However, _parse_interval_reads_response creates naive datetimes if the GraphQL API returns timestamps without a Z or offset (e.g., 2024-04-30T01:00:00).
  • Impact: If a naive datetime is parsed, the conversion is silently skipped. The daily bucketing logic will then use UTC midnight instead of local midnight, skewing daily energy totals across daylight saving time boundaries.

7. Service point discovery silently drops secondary meters [Claude & Gemini]

  • Issue: _async_discover_service_point caches and returns the very first registerId it finds in the GraphQL edge list.
  • Impact: Accounts with multiple meters (e.g., a primary consumption meter and a separate Net-Metering solar meter, or an EV sub-meter) will silently lose data for all but the first meter. There is no parameter allowing the caller to specify which service point to query.

8. Real-time feed replaced by delayed interval feed [Claude & Gemini]

  • Issue: async_get_realtime_usage_reads no longer queries the dedicated cws-real-time-ami-v1 endpoint. It now reuses the standard interval GraphQL query with a hardcoded 2-day window.
  • Impact: "Real-time" AMI endpoints often provide data with much lower latency (e.g., minutes) than standard billing interval endpoints (which often wait for next-day verification). Automations relying on near-real-time triggers may experience massive delays.

9. last: 26 bill limit truncates long date ranges [Claude]

  • Issue: _async_get_bill_cost_reads hardcodes last: 26 alongside the requested timeInterval.
  • Impact: If a user requests 3 years of historical data, the API will silently truncate the response to the most recent 26 bills (~2.1 years). Because there is no pagination logic implemented for bills, older data is permanently inaccessible.

10. Gas interval reads are unvalidated (explicit TODO) [Claude]

  • Issue: _get_interval_read_filters hardcodes CCF and DELIVERED for gas accounts, but includes a comment admitting this hasn't been tested against a live gas utility.
  • Impact: If the target utility uses THERM or a different ServiceQuantityIdentifier string, interval reads for gas accounts will silently return empty lists.

🟡 P3 — Minor Tech Debt & Edge Cases

11. Hardcoded 2-day realtime window causes redundant requests [Gemini]

  • Issue: async_get_realtime_usage_reads hardcodes a timedelta(days=2) lookback.
  • Impact: Because the underlying fetcher batches in strictly 24-hour windows, this guarantees exactly two GraphQL network requests every time the "real-time" function is polled, regardless of how much new data the user actually needs.

12. Floating point arithmetic used for monetary aggregation [Gemini]

  • [Gemini Disagrees with Claude's dismissal]
    • Claude's view: Using Decimal is false precision because the incoming JSON is already a float.
    • Gemini's view: While the incoming data is a float, continuously summing hundreds of interval floats (e.g. 0.012 + 0.014... over a month) exacerbates standard IEEE 754 arithmetic errors. round(..., 4) masks it, but it remains a code smell in financial data aggregation. It's a valid P3 tech debt item.

13. ApiException raised with url="graphql" placeholder [Claude]

  • Issue: In _async_discover_service_point, error handling hardcodes the string "graphql" into the url parameter of the exception.
  • Impact: Produces confusing, unhelpful error logs when debugging discovery failures.

14. AggregateType.BILL in _aggregate_interval_reads is a silent no-op [Claude]

  • Issue: The else branch sets bucket = start_time, effectively creating one bucket per 15-minute read rather than actually aggregating anything to a billing level.
  • Impact: Harmless in practice because async_get_cost_reads intercepts BILL requests before they reach this function, but it is a latent logical gap if the internal method is reused later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants