[FLPATH-3326] Add Azure self-hosted/on-prem support by ydayagi · Pull Request #5932 · project-koku/koku

ydayagi · 2026-03-04T19:35:43Z

[FLPATH-3326] Add Azure self-hosted/on-prem support

Add self-hosted PostgreSQL support for Azure provider, following the
same pattern as AWS.

Changes:
- Add Django model for Azure line items (azure_line_items)
- Add migration for partitioned Azure line item table

gemini-code-assist · 2026-03-04T19:37:18Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system by adding support for Azure self-hosted/on-prem deployments. It introduces necessary database models and migrations to handle Azure line items, following a similar approach to the existing AWS implementation. This expansion allows for more comprehensive cost management across different cloud providers in self-hosted environments.

Highlights

Azure Self-Hosted Support: Adds support for self-hosted PostgreSQL for the Azure provider, mirroring the existing AWS pattern.
Django Model: Introduces a Django model for Azure line items (azure_line_items) to manage data.
Partitioned Table Migration: Includes a migration for a partitioned Azure line item table to improve query performance.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

koku/masu/database/aws_report_db_accessor.py
- Updated SQL folder path to use a dynamic method.
koku/masu/database/azure_report_db_accessor.py
- Updated SQL folder path to use a dynamic method.
- Added function to delete self-hosted data by source.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/0_prepare_daily_summary_tables.sql
- Added SQL script to create temporary and permanent tables for AWS OpenShift daily summary.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/1_resource_matching_by_cluster.sql
- Added SQL script to match AWS resources to OpenShift clusters.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql
- Added SQL script to summarize AWS data by OpenShift cluster.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/3_reporting_ocpawscostlineitem_project_daily_summary_p.sql
- Added SQL script to insert data into the partitioned AWS OpenShift cost summary table.
koku/masu/database/self_hosted_sql/aws/openshift/reporting_ocpaws_matched_tags.sql
- Added SQL script to report OpenShift AWS matched tags.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_compute_summary_p.sql
- Added SQL script to generate compute summary for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_account_p.sql
- Added SQL script to summarize costs by account for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_region_p.sql
- Added SQL script to summarize costs by region for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_service_p.sql
- Added SQL script to summarize costs by service for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_p.sql
- Added SQL script to generate cost summary for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_database_summary_p.sql
- Added SQL script to generate database summary for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_network_summary_p.sql
- Added SQL script to generate network summary for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_storage_summary_p.sql
- Added SQL script to generate storage summary for OpenShift on AWS.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpawscostlineitem_project_daily_summary_p.sql
- Added SQL script to insert data into the partitioned AWS OpenShift cost line item project daily summary table.
koku/masu/database/self_hosted_sql/aws/reporting_awscostentrylineitem_daily_summary.sql
- Added SQL script to insert data into the AWS cost entry line item daily summary table.
koku/masu/database/self_hosted_sql/aws/reporting_awscostentrylineitem_summary_by_ec2_compute_p.sql
- Added SQL script to summarize AWS cost entry line items by EC2 compute.
koku/masu/database/self_hosted_sql/aws/reporting_ocpinfrastructure_provider_map.sql
- Added SQL script to map OpenShift infrastructure providers to AWS.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/0_prepare_daily_summary_tables.sql
- Added SQL script to create temporary and permanent tables for Azure OpenShift daily summary.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/1_resource_matching_by_cluster.sql
- Added SQL script to match Azure resources to OpenShift clusters.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql
- Added SQL script to summarize Azure data by OpenShift cluster.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/3_reporting_ocpazurecostlineitem_project_daily_summary_p.sql
- Added SQL script to insert data into the partitioned Azure OpenShift cost summary table.
koku/masu/database/self_hosted_sql/azure/openshift/reporting_ocpazure_matched_tags.sql
- Added SQL script to report OpenShift Azure matched tags.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_compute_summary_p.sql
- Added SQL script to generate compute summary for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_account_p.sql
- Added SQL script to summarize costs by account for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_location_p.sql
- Added SQL script to summarize costs by location for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_service_p.sql
- Added SQL script to summarize costs by service for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_p.sql
- Added SQL script to generate cost summary for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_database_summary_p.sql
- Added SQL script to generate database summary for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_network_summary_p.sql
- Added SQL script to generate network summary for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_storage_summary_p.sql
- Added SQL script to generate storage summary for OpenShift on Azure.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazurecostlineitem_project_daily_summary_p.sql
- Added SQL script to insert data into the partitioned Azure OpenShift cost line item project daily summary table.
koku/masu/database/self_hosted_sql/azure/reporting_azurecostentrylineitem_daily_summary.sql
- Added SQL script to insert data into the Azure cost entry line item daily summary table.
koku/masu/database/self_hosted_sql/azure/reporting_ocpinfrastructure_provider_map.sql
- Added SQL script to map OpenShift infrastructure providers to Azure.
koku/masu/processor/aws/aws_report_parquet_processor.py
- Added self-hosted line item model support and table name retrieval.
- Added a method to prepare the dataframe for writing.
koku/masu/processor/azure/azure_report_parquet_processor.py
- Added self-hosted line item model support and table name retrieval.
- Added a method to prepare the dataframe for writing.
koku/masu/processor/ocp/ocp_report_parquet_processor.py
- Added a method to prepare the dataframe for writing.
koku/masu/processor/report_parquet_processor_base.py
- Implemented self-hosted table writing and deletion methods.
- Added a method to prepare the dataframe for writing.
koku/reporting/migrations/0344_aws_line_item_models.py
- Added Django models for AWS line items (self-hosted/on-prem).
koku/reporting/migrations/0345_azure_line_item_models.py
- Added Django models for Azure line items (self-hosted/on-prem).
koku/reporting/provider/aws/models.py
- Imported self-hosted models for Django registration.
koku/reporting/provider/aws/self_hosted_models.py
- Added Django models for AWS line items (self-hosted/on-prem).
koku/reporting/provider/azure/models.py
- Imported self-hosted models for Django registration.
koku/reporting/provider/azure/self_hosted_models.py
- Added Django models for Azure line items (self-hosted/on-prem).

Activity

Added Django model for Azure line items (azure_line_items)
Added migration for partitioned Azure line item table

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds self-hosted PostgreSQL support for the Azure provider, mirroring the existing AWS implementation. This includes new Django models for Azure line items, migrations for partitioned tables, and extensive SQL scripts for data processing. The changes also involve a significant and beneficial refactoring of the ReportParquetProcessorBase to generalize the logic for writing to self-hosted tables, reducing code duplication across provider-specific processors. While the overall approach is solid, I've identified a critical bug in one of the new Azure SQL scripts that will prevent tag-based cost attribution from working correctly, and an inconsistency in tag matching logic compared to the AWS implementation. Addressing these issues will ensure the new functionality is robust and consistent.

_{Note: Security Review did not run due to the size of the PR.}

codecov · 2026-03-04T20:27:18Z

Codecov Report

❌ Patch coverage is 96.36651% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.4%. Comparing base (47db450) to head (4adaecd).

Additional details and impacted files

@@          Coverage Diff           @@
##            main   #5932    +/-   ##
======================================
  Coverage   94.4%   94.4%            
======================================
  Files        362     366     +4     
  Lines      31988   32583   +595     
  Branches    3513    3529    +16     
======================================
+ Hits       30185   30756   +571     
- Misses      1168    1190    +22     
- Partials     635     637     +2

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ydayagi · 2026-03-05T09:34:45Z

/retest

myersCody · 2026-03-10T14:38:44Z

@dchorvat1 can you run our integration tests on these to confirm functionality, then move it out of draft.

myersCody · 2026-05-05T19:46:32Z

@@ -0,0 +1,154 @@
+CREATE TABLE IF NOT EXISTS {{schema | sqlsafe}}.managed_aws_openshift_daily_temp


What is the migration strategy for these tables in the on premise flow? Cause it doesn't appear like we have one at all from my perspective.

myersCody · 2026-05-05T19:51:42Z

Testing Instructions are required

myersCody · 2026-05-05T20:02:39Z

+        # Delete from existing tables
+        total_deleted = 0
+        for table_name in existing_tables:
+            delete_sql = get_report_db_accessor().get_delete_day_by_manifestid_sql(


The name of this method is terrible considering it deletes the entire month.

def get_delete_day_by_manifestid_sql( self, schema_name: str, table_name: str, source: str, year: str, month: str, manifestid: str ): """Return the SQL to delete data where manifestid doesn't match.""" return f""" DELETE FROM "{schema_name}"."{table_name}" WHERE source = '{source}' AND year = '{year}' AND month = '{month}' AND manifestid != '{manifestid}' """

myersCody · 2026-05-05T20:02:56Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


Why are we calling this delete day when we delete the entire month?

myersCody · 2026-05-05T20:09:28Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


You pass in start_date here but don't seem to use it anywhere?

myersCody · 2026-05-05T20:17:15Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


I highly recommend we follow the call chain for this

for csv_filename in file_list: # set start date based on data in the file being processed: if self.provider_type == Provider.PROVIDER_OCP: self.start_date = self.ocp_files_to_process[csv_filename.stem]["meta_reportdatestart"] self._delete_old_data(Path(csv_filename)) if self.provider_type == Provider.PROVIDER_OCP and self.report_type is None: msg = "Unknown report type, skipping file processing" LOG.warning( log_json( self.tracing_id, msg=msg, context=self.error_context, filename=csv_filename, ) ) return

Inside of _delete_old_data:

if settings.ONPREM: self._delete_old_data_postgres(filename) else: self._delete_old_data_trino(filename)

def _delete_old_data_postgres(self, filename): """remove records with data older than the data in the file being processed""" # Get reportnumhours for OCP (will be None for non-OCP) reportnumhours = None if self.ocp_files_to_process: reportnumhours = int(self.ocp_files_to_process[filename.stem]["meta_reportnumhours"]) # Processor handles deleting from all relevant tables (raw and daily for OCP) processor = self._get_report_processor(daily=False) processor.delete_day_postgres(self.start_date, reportnumhours)

Are you deleting a whole month of data each time we process a csv?

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add self-hosted PostgreSQL support for Azure provider, following the same pattern as AWS. Changes: - Add Django model for Azure line items (azure_line_items) - Add migration for partitioned Azure line item table - Add self_hosted_sql/azure/ directory with PostgreSQL-converted SQL files - Update Azure processor with _date_column, self_hosted_line_item_model - Update Azure db accessor to use get_sql_folder_name() - Add delete_self_hosted_data_by_source() for cleanup Jira: https://issues.redhat.com/browse/FLPATH-3323 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Yaron Dayagi <ydayagi@redhat.com>

koku-ci-triager-bot · 2026-05-19T12:21:01Z

🤖 CI Triager — Diagnosis

Check: Red Hat Konflux / koku-ci / koku
PipelineRun: koku-ci-gs6dp
Root cause: The deploy-application task timed out waiting for the ephemeral Clowder environment to become ready. Multiple dependent services (sources-api, rbac, puptoo, trino) failed to start, and the Clowder environment was locked. This is a transient infrastructure issue unrelated to this PR's code changes.
Evidence:

Warning  ClowdEnvLocked   clowdapp/koku     Clowder Environment [env-ephemeral-2ruzpr] is locked
Warning  ClowdAppNotReady clowdapp/koku     ClowdApp [koku] is not ready
Warning  BackOff          pod/sources-api-svc-5c9949777f-vhq5b  Back-off restarting failed container

ERROR: deploy failed: timed out waiting for ClowdApp-owned resources

Action: Re-trigger the koku-ci check. The ephemeral environment infrastructure was unhealthy at the time this run executed.

Generated automatically. Review before applying.

koku-ci-triager-bot · 2026-05-19T12:21:02Z

🤖 CI Triager — Warning

Check: Migration convention
Root cause: This PR adds 2 migration files, but the Koku convention requires at most 1 migration per PR. Multiple migrations should be squashed into a single file before merging.
Evidence:

koku/reporting/migrations/0351_awslineitem_awslineitemdaily_and_more.py
koku/reporting/migrations/0352_azurelineitem_managedazureopenshiftdaily_and_more.py

Action: Squash the migrations into a single file:

python koku/manage.py squashmigrations <app_label> <first_migration> <last_migration>

Replace the two migration files with the generated squashed migration and update the dependencies accordingly.

Generated automatically. Review before applying.

ydayagi requested review from a team as code owners March 4, 2026 19:35

gemini-code-assist Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread ...abase/self_hosted_sql/azure/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql Outdated

Comment thread koku/masu/database/self_hosted_sql/azure/openshift/reporting_ocpazure_matched_tags.sql Outdated

ydayagi force-pushed the trino2pgazure branch 2 times, most recently from cfd6f9f to 241ae0b Compare March 4, 2026 20:13

ydayagi removed the on-prem-processing pr_check will deploy and run the on-prem data pipeline processing flow. label Mar 5, 2026

ydayagi changed the title ~~[FLPATH-3323] Add Azure self-hosted/on-prem support~~ [FLPATH-3326] Add Azure self-hosted/on-prem support Mar 5, 2026

lcouzens assigned lcouzens and unassigned lcouzens Mar 10, 2026

lcouzens added the flightpath-pr Issues being worked on by the flight path team label Mar 10, 2026

myersCody marked this pull request as draft March 10, 2026 14:37

ydayagi force-pushed the trino2pgazure branch from 241ae0b to 69edb53 Compare March 10, 2026 17:11

myersCody added the on-hold label Mar 18, 2026

myersCody requested changes May 5, 2026

View reviewed changes

myersCody reviewed May 5, 2026

View reviewed changes

ydayagi force-pushed the trino2pgazure branch 3 times, most recently from e3fa6ae to 7930848 Compare May 11, 2026 05:47

[FLPATH-3323] Add AWS self-hosted/on-prem support

493613a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ydayagi force-pushed the trino2pgazure branch from 7930848 to 3f9a2f5 Compare May 11, 2026 05:58

ydayagi force-pushed the trino2pgazure branch from 3f9a2f5 to 4adaecd Compare May 11, 2026 06:43

		@@ -0,0 +1,154 @@
		CREATE TABLE IF NOT EXISTS {{schema \| sqlsafe}}.managed_aws_openshift_daily_temp

Conversation

ydayagi commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ydayagi commented Mar 5, 2026

Uh oh!

myersCody commented Mar 10, 2026

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

koku-ci-triager-bot commented May 19, 2026

Uh oh!

koku-ci-triager-bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydayagi commented Mar 4, 2026 •

edited

Loading

codecov Bot commented Mar 4, 2026 •

edited

Loading

myersCody commented May 5, 2026 •

edited

Loading