[FLPATH-3327] Add GCP self-hosted/on-prem support by ydayagi · Pull Request #5943 · project-koku/koku

ydayagi · 2026-03-11T10:35:03Z

Add Django ORM models and PostgreSQL support for GCP line item data
storage in on-prem deployments without Trino/Hive.

Changes:

Add GCPLineItem and GCPLineItemDaily Django models with partitioning
Add migration 0346 for GCP line item tables
Update GCP processor with self_hosted_line_item_model property
Update GCP db accessor to use get_sql_folder_name()
Add delete_self_hosted_data_by_source() for cleanup
Copy PostgreSQL SQL files to self_hosted_sql/gcp/

https://issues.redhat.com/browse/FLPATH-3323

codecov · 2026-03-11T11:04:32Z

Codecov Report

❌ Patch coverage is 96.94224% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.4%. Comparing base (47db450) to head (a9b95d3).

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #5943     +/-   ##
=======================================
+ Coverage   94.4%   94.4%   +0.1%     
=======================================
  Files        362     368      +6     
  Lines      31988   32827    +839     
  Branches    3513    3532     +19     
=======================================
+ Hits       30185   30998    +813     
- Misses      1168    1194     +26     
  Partials     635     635

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemini-code-assist · 2026-03-11T11:21:20Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the platform's capabilities by enabling the storage and processing of Google Cloud Platform (GCP) cost and usage data within self-hosted, on-premise environments. It achieves this by integrating new Django ORM models with PostgreSQL table partitioning, providing a robust and scalable solution for managing large datasets without relying on external data lakes like Trino or Hive. The changes streamline data ingestion, summarization, and cleanup processes for cloud providers in an on-prem context.

Highlights

GCP Self-Hosted Support: Introduced comprehensive support for Google Cloud Platform (GCP) line item data storage in on-premise deployments, eliminating the dependency on Trino/Hive.
Django ORM Models & PostgreSQL Partitioning: Added new Django ORM models (GCPLineItem and GCPLineItemDaily) with PostgreSQL partitioning to efficiently manage and store GCP line item data locally.
Unified Self-Hosted Data Processing: Refactored report parquet processors across AWS, Azure, and GCP to leverage a common base class for writing data to self-hosted PostgreSQL tables, ensuring consistent data ingestion and management.
Enhanced Data Deletion Logic: Implemented new PostgreSQL functions and logic for date-scoped and manifest-ID based deletion of self-hosted data, allowing for more granular control over data retention and reprocessing.
New PostgreSQL SQL Templates: Included a suite of new PostgreSQL SQL files for AWS, Azure, and GCP to handle daily summary, UI summary, and matched tag processing within the self-hosted environment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

koku/koku/reportdb_accessor_postgres.py
- Added get_delete_day_by_manifestid_and_date_sql for date-scoped data deletion in PostgreSQL.
koku/masu/database/aws_report_db_accessor.py
- Updated SQL file path retrieval to use get_sql_folder_name().
- Added delete_self_hosted_data_by_source for cleaning up self-hosted AWS data.
koku/masu/database/azure_report_db_accessor.py
- Updated SQL file path retrieval to use get_sql_folder_name().
- Added delete_self_hosted_data_by_source for cleaning up self-hosted Azure data.
koku/masu/database/gcp_report_db_accessor.py
- Updated SQL file path retrieval to use get_sql_folder_name().
- Added delete_self_hosted_data_by_source for cleaning up self-hosted GCP data.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/0_prepare_daily_summary_tables.sql
- Added SQL to create temporary and summary tables for AWS OCP daily data processing.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/1_resource_matching_by_cluster.sql
- Added SQL for resource matching and data insertion into temporary tables for AWS OCP.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql
- Added SQL for summarizing AWS OCP data by cluster, including storage and network costs.
koku/masu/database/self_hosted_sql/aws/openshift/populate_daily_summary/3_reporting_ocpawscostlineitem_project_daily_summary_p.sql
- Added SQL to insert managed AWS OCP data into the final PostgreSQL summary table.
koku/masu/database/self_hosted_sql/aws/openshift/reporting_ocpaws_matched_tags.sql
- Added SQL for identifying matched tags between AWS and OCP resources.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_compute_summary_p.sql
- Added SQL for populating AWS OCP UI compute summary tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_account_p.sql
- Added SQL for populating AWS OCP UI cost summary by account tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_region_p.sql
- Added SQL for populating AWS OCP UI cost summary by region tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_by_service_p.sql
- Added SQL for populating AWS OCP UI cost summary by service tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_cost_summary_p.sql
- Added SQL for populating AWS OCP UI overall cost summary tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_database_summary_p.sql
- Added SQL for populating AWS OCP UI database summary tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_network_summary_p.sql
- Added SQL for populating AWS OCP UI network summary tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpaws_storage_summary_p.sql
- Added SQL for populating AWS OCP UI storage summary tables.
koku/masu/database/self_hosted_sql/aws/openshift/ui_summary/reporting_ocpawscostlineitem_project_daily_summary_p.sql
- Added SQL for inserting managed AWS OCP project daily summary data into PostgreSQL.
koku/masu/database/self_hosted_sql/aws/reporting_awscostentrylineitem_daily_summary.sql
- Added SQL for inserting AWS daily line item data into the summary table.
koku/masu/database/self_hosted_sql/aws/reporting_awscostentrylineitem_summary_by_ec2_compute_p.sql
- Added SQL for summarizing AWS EC2 compute costs.
koku/masu/database/self_hosted_sql/aws/reporting_ocpinfrastructure_provider_map.sql
- Added SQL for mapping OCP and AWS infrastructure providers.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/0_prepare_daily_summary_tables.sql
- Added SQL to create temporary and summary tables for Azure OCP daily data processing.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/1_resource_matching_by_cluster.sql
- Added SQL for resource matching and data insertion into temporary tables for Azure OCP.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql
- Added SQL for summarizing Azure OCP data by cluster, including storage and network costs.
koku/masu/database/self_hosted_sql/azure/openshift/populate_daily_summary/3_reporting_ocpazurecostlineitem_project_daily_summary_p.sql
- Added SQL to insert managed Azure OCP data into the final PostgreSQL summary table.
koku/masu/database/self_hosted_sql/azure/openshift/reporting_ocpazure_matched_tags.sql
- Added SQL for identifying matched tags between Azure and OCP resources.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_compute_summary_p.sql
- Added SQL for populating Azure OCP UI compute summary tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_account_p.sql
- Added SQL for populating Azure OCP UI cost summary by account tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_location_p.sql
- Added SQL for populating Azure OCP UI cost summary by location tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_by_service_p.sql
- Added SQL for populating Azure OCP UI cost summary by service tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_cost_summary_p.sql
- Added SQL for populating Azure OCP UI overall cost summary tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_database_summary_p.sql
- Added SQL for populating Azure OCP UI database summary tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_network_summary_p.sql
- Added SQL for populating Azure OCP UI network summary tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazure_storage_summary_p.sql
- Added SQL for populating Azure OCP UI storage summary tables.
koku/masu/database/self_hosted_sql/azure/openshift/ui_summary/reporting_ocpazurecostlineitem_project_daily_summary_p.sql
- Added SQL for inserting managed Azure OCP project daily summary data into PostgreSQL.
koku/masu/database/self_hosted_sql/azure/reporting_azurecostentrylineitem_daily_summary.sql
- Added SQL for inserting Azure daily line item data into the summary table.
koku/masu/database/self_hosted_sql/azure/reporting_ocpinfrastructure_provider_map.sql
- Added SQL for mapping OCP and Azure infrastructure providers.
koku/masu/database/self_hosted_sql/gcp/get_invoice_month_dates.sql
- Added SQL for fetching extended invoice month dates for GCP.
koku/masu/database/self_hosted_sql/gcp/openshift/populate_daily_summary/0_prepare_daily_summary_tables.sql
- Added SQL to create temporary and summary tables for GCP OCP daily data processing.
koku/masu/database/self_hosted_sql/gcp/openshift/populate_daily_summary/1_resource_matching_by_cluster.sql
- Added SQL for resource matching and data insertion into temporary tables for GCP OCP.
koku/masu/database/self_hosted_sql/gcp/openshift/populate_daily_summary/2_summarize_data_by_cluster.sql
- Added SQL for summarizing GCP OCP data by cluster, including storage and network costs.
koku/masu/database/self_hosted_sql/gcp/openshift/populate_daily_summary/3_reporting_ocpgcpcostlineitem_project_daily_summary_p.sql
- Added SQL to insert managed GCP OCP data into the final PostgreSQL summary table.
koku/masu/database/self_hosted_sql/gcp/openshift/reporting_ocpgcp_matched_tags.sql
- Added SQL for identifying matched tags between GCP and OCP resources.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_compute_summary_p.sql
- Added SQL for populating GCP OCP UI compute summary tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_cost_summary_by_account_p.sql
- Added SQL for populating GCP OCP UI cost summary by account tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_cost_summary_by_gcp_project_p.sql
- Added SQL for populating GCP OCP UI cost summary by GCP project tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_cost_summary_by_region_p.sql
- Added SQL for populating GCP OCP UI cost summary by region tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_cost_summary_by_service_p.sql
- Added SQL for populating GCP OCP UI cost summary by service tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_cost_summary_p.sql
- Added SQL for populating GCP OCP UI overall cost summary tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_database_summary_p.sql
- Added SQL for populating GCP OCP UI database summary tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_network_summary_p.sql
- Added SQL for populating GCP OCP UI network summary tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcp_storage_summary_p.sql
- Added SQL for populating GCP OCP UI storage summary tables.
koku/masu/database/self_hosted_sql/gcp/openshift/ui_summary/reporting_ocpgcpcostlineitem_project_daily_summary_p.sql
- Added SQL for inserting managed GCP OCP project daily summary data into PostgreSQL.
koku/masu/database/self_hosted_sql/gcp/reporting_gcpcostentrylineitem_daily_summary.sql
- Added SQL for inserting GCP daily line item data into the summary table.
koku/masu/database/self_hosted_sql/gcp/reporting_ocpinfrastructure_provider_map.sql
- Added SQL for mapping OCP and GCP infrastructure providers.
koku/masu/database/self_hosted_sql/openshift/cost_model/monthly_cost_gpu.sql
- Updated GPU product label extraction to use ::jsonb->>'nvidia_com_gpu_product' for consistency.
koku/masu/database/self_hosted_sql/openshift/ocp_special_matched_tags.sql
- Added SQL for aggregating special matched tags for OCP.
koku/masu/processor/aws/aws_report_parquet_processor.py
- Updated to track daily status and define _date_column.
- Implemented self_hosted_line_item_model property to return AWS Django models.
- Implemented get_table_names_for_delete to return relevant AWS table names.
- Implemented _prepare_dataframe_for_write to add manifestid to the dataframe.
koku/masu/processor/azure/azure_report_parquet_processor.py
- Updated to track daily status and define _date_column.
- Implemented self_hosted_line_item_model property to return Azure Django models.
- Implemented get_table_names_for_delete to return relevant Azure table names.
- Implemented _prepare_dataframe_for_write to add manifestid to the dataframe.
koku/masu/processor/gcp/gcp_report_parquet_processor.py
- Updated to track daily status and define _date_column.
- Implemented self_hosted_line_item_model property to return GCP Django models.
- Implemented get_table_names_for_delete to return relevant GCP table names.
- Implemented _prepare_dataframe_for_write to add manifestid to the dataframe.
koku/masu/processor/ocp/ocp_report_parquet_processor.py
- Defined _date_column for OCP data.
- Refactored write_to_self_hosted_table to delegate to the base class implementation.
koku/masu/processor/report_parquet_processor_base.py
- Introduced self_hosted_line_item_model property and _prepare_dataframe_for_write method for subclass implementation.
- Added a generic write_to_self_hosted_table method that uses Django models and PostgreSQL partitioning.
- Added get_table_names_for_delete and delete_day_postgres for manifest-ID based deletion in PostgreSQL.
koku/masu/test/database/test_aws_report_db_accessor.py
- Added tests for delete_self_hosted_data_by_source functionality.
koku/masu/test/database/test_azure_report_db_accessor.py
- Added tests for delete_self_hosted_data_by_source functionality.
koku/masu/test/database/test_gcp_report_db_accessor.py
- Added tests for delete_self_hosted_data_by_source functionality.
koku/masu/test/processor/aws/test_aws_report_parquet_processor.py
- Added tests for _is_daily flag, self_hosted_line_item_model, get_table_names_for_delete, _prepare_dataframe_for_write, and write_to_self_hosted_table.
koku/masu/test/processor/azure/test_azure_report_parquet_processor.py
- Added tests for _is_daily flag, self_hosted_line_item_model, get_table_names_for_delete, _prepare_dataframe_for_write, and write_to_self_hosted_table.
koku/masu/test/processor/gcp/test_gcp_report_parquet_processor.py
- Added tests for _is_daily flag, self_hosted_line_item_model, get_table_names_for_delete, _prepare_dataframe_for_write, and write_to_self_hosted_table.
koku/masu/test/processor/ocp/test_ocp_report_parquet_processor.py
- Updated test for write_to_self_hosted_table to reflect the base class refactoring.
koku/masu/test/processor/test_report_parquet_processor_base.py
- Added tests for base class methods: self_hosted_line_item_model, _prepare_dataframe_for_write, write_to_self_hosted_table (when no model), and get_table_names_for_delete.
koku/masu/util/aws/common.py
- Added get_table_names_for_delete to retrieve table names for PostgreSQL deletion.
- Added _delete_old_data_postgres_by_date for date-scoped deletion in PostgreSQL.
- Added _clear_csv_only for S3 CSV file deletion in on-prem environments.
- Updated get_or_clear_daily_s3_by_date to integrate new PostgreSQL deletion logic for on-prem deployments.
koku/reporting/migrations/0344_aws_line_item_models.py
- Added migration to create AWSLineItem and AWSLineItemDaily Django models with PostgreSQL partitioning.
koku/reporting/migrations/0345_azure_line_item_models.py
- Added migration to create AzureLineItem Django model with PostgreSQL partitioning.
koku/reporting/migrations/0346_gcp_line_item_models.py
- Added migration to create GCPLineItem and GCPLineItemDaily Django models with PostgreSQL partitioning.
koku/reporting/provider/aws/models.py
- Imported new self-hosted AWS line item models (AWSLineItem, AWSLineItemDaily).
koku/reporting/provider/aws/self_hosted_models.py
- Added new Django models AWSLineItem and AWSLineItemDaily for self-hosted PostgreSQL storage, including partitioning information.
koku/reporting/provider/azure/models.py
- Imported new self-hosted Azure line item model (AzureLineItem).
koku/reporting/provider/azure/self_hosted_models.py
- Added new Django model AzureLineItem for self-hosted PostgreSQL storage, including partitioning information.
koku/reporting/provider/gcp/models.py
- Imported new self-hosted GCP line item models (GCPLineItem, GCPLineItemDaily).
koku/reporting/provider/gcp/self_hosted_models.py
- Added new Django models GCPLineItem and GCPLineItemDaily for self-hosted PostgreSQL storage, including partitioning information.

Activity

No human activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant changes to add self-hosted/on-prem support for GCP, following a similar pattern for AWS and Azure. This includes new Django models for line item data, PostgreSQL-specific SQL queries for data processing, and refactoring of existing database accessors and processors to accommodate the on-premise logic. The changes are extensive and well-structured, with new tests covering the added functionality. My review focuses on potential security vulnerabilities, data type correctness for financial data, and opportunities for code consolidation to improve maintainability.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-11T11:29:45Z

+        return f"""
+            DELETE FROM "{schema_name}"."{table_name}"
+            WHERE source = '{source}'
+              AND year = '{year}'
+              AND month = '{month}'
+              AND manifestid != '{manifestid}'
+              AND {DATE_COLUMN} >= DATE '{processing_date}'
+        """


This method constructs a raw SQL query using an f-string, which is a potential SQL injection vulnerability. Although the values might be system-generated, it is a security best practice to use parameterized queries. Please consider modifying this method to return a SQL template and a list of parameters, and then use cursor.execute(sql, params) at the call site to safely execute the query. This would provide protection against SQL injection.

gemini-code-assist · 2026-03-11T11:29:45Z

+    unblended_cost FLOAT,
+    blended_cost FLOAT,
+    savingsplan_effective_cost FLOAT,
+    calculated_amortized_cost FLOAT,


Using FLOAT for currency columns such as unblended_cost, blended_cost, savingsplan_effective_cost, and calculated_amortized_cost can lead to floating-point inaccuracies. For financial calculations, it is highly recommended to use DECIMAL or NUMERIC data types to ensure precision. This recommendation also applies to other temporary tables created in this pull request.

unblended_cost DECIMAL, blended_cost DECIMAL, savingsplan_effective_cost DECIMAL, calculated_amortized_cost DECIMAL,

gemini-code-assist · 2026-03-11T11:29:45Z

+    usage_pricing_unit = models.CharField(max_length=256, null=True)
+
+    # Cost columns
+    cost = models.FloatField(null=True)


Using FloatField for currency values like cost can introduce precision issues. It is a best practice to use DecimalField for all monetary values to maintain accuracy in financial calculations. This also applies to other new models for AWS and Azure introduced in this pull request.

Suggested change

cost = models.FloatField(null=True)

cost = models.DecimalField(max_digits=24, decimal_places=9, null=True)

gemini-code-assist · 2026-03-11T11:29:45Z

+    def delete_self_hosted_data_by_source(self, provider_uuid):
+        """Delete data from all self-hosted tables by source UUID (for on-prem).
+
+        This deletes data from the line item tables when a source is deleted.
+
+        Args:
+            provider_uuid: The provider UUID to delete data for
+        """
+        from reporting.provider.aws.self_hosted_models import get_self_hosted_models
+
+        provider_uuid_str = str(provider_uuid)
+        total_deleted = 0
+
+        with schema_context(self.schema):
+            for model in get_self_hosted_models():
+                deleted_count, _ = model.objects.filter(source=provider_uuid_str).delete()
+
+                if deleted_count:
+                    LOG.info(
+                        log_json(
+                            msg="deleted self-hosted data by source",
+                            table=model._meta.db_table,
+                            provider_uuid=provider_uuid_str,
+                            deleted_count=deleted_count,
+                        )
+                    )
+                    total_deleted += deleted_count
+
+        return total_deleted


The method delete_self_hosted_data_by_source is nearly identical in AWSReportDBAccessor, AzureReportDBAccessor, and GCPReportDBAccessor. To improve maintainability and reduce code duplication, consider moving this logic to a shared base class, such as ReportDBAccessorBase. The provider-specific get_self_hosted_models function could be defined as an abstract method in the base class that subclasses are required to implement.

myersCody · 2026-05-05T19:47:27Z

@@ -0,0 +1,154 @@
+CREATE TABLE IF NOT EXISTS {{schema | sqlsafe}}.managed_aws_openshift_daily_temp


What is the migration strategy for these tables in the on premise flow? Cause it doesn't appear like we have one at all from my perspective.

myersCody · 2026-05-05T20:03:55Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


Why are we calling this delete day when we delete the entire month?

myersCody · 2026-05-05T20:04:18Z

+        # Delete from existing tables
+        total_deleted = 0
+        for table_name in existing_tables:
+            delete_sql = get_report_db_accessor().get_delete_day_by_manifestid_sql(


The name of this method is terrible considering it deletes the entire month.

def get_delete_day_by_manifestid_sql( self, schema_name: str, table_name: str, source: str, year: str, month: str, manifestid: str ): """Return the SQL to delete data where manifestid doesn't match.""" return f""" DELETE FROM "{schema_name}"."{table_name}" WHERE source = '{source}' AND year = '{year}' AND month = '{month}' AND manifestid != '{manifestid}' """

myersCody · 2026-05-05T20:11:04Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


You pass in start_date here but don't seem to use it anywhere

myersCody · 2026-05-05T20:18:07Z

+        """Return list of table names to delete from. Override in subclass if needed."""
+        return [self._table_name]
+
+    def delete_day_postgres(self, start_date, reportnumhours=None):


I highly recommend we follow the call chain for this

for csv_filename in file_list: # set start date based on data in the file being processed: if self.provider_type == Provider.PROVIDER_OCP: self.start_date = self.ocp_files_to_process[csv_filename.stem]["meta_reportdatestart"] self._delete_old_data(Path(csv_filename)) if self.provider_type == Provider.PROVIDER_OCP and self.report_type is None: msg = "Unknown report type, skipping file processing" LOG.warning( log_json( self.tracing_id, msg=msg, context=self.error_context, filename=csv_filename, ) ) return

Inside of _delete_old_data:

if settings.ONPREM: self._delete_old_data_postgres(filename) else: self._delete_old_data_trino(filename)

def _delete_old_data_postgres(self, filename): """remove records with data older than the data in the file being processed""" # Get reportnumhours for OCP (will be None for non-OCP) reportnumhours = None if self.ocp_files_to_process: reportnumhours = int(self.ocp_files_to_process[filename.stem]["meta_reportnumhours"]) # Processor handles deleting from all relevant tables (raw and daily for OCP) processor = self._get_report_processor(daily=False) processor.delete_day_postgres(self.start_date, reportnumhours)

Are you deleting a whole month of data each time we process a csv?

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add self-hosted PostgreSQL support for Azure provider, following the same pattern as AWS. Changes: - Add Django model for Azure line items (azure_line_items) - Add migration for partitioned Azure line item table - Add self_hosted_sql/azure/ directory with PostgreSQL-converted SQL files - Update Azure processor with _date_column, self_hosted_line_item_model - Update Azure db accessor to use get_sql_folder_name() - Add delete_self_hosted_data_by_source() for cleanup Jira: https://issues.redhat.com/browse/FLPATH-3323 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Yaron Dayagi <ydayagi@redhat.com>

Add Django ORM models and PostgreSQL support for GCP line item data storage in on-prem deployments without Trino/Hive. Changes: - Add GCPLineItem and GCPLineItemDaily Django models with partitioning - Add migration 0346 for GCP line item tables - Update GCP processor with self_hosted_line_item_model property - Update GCP db accessor to use get_sql_folder_name() - Add delete_self_hosted_data_by_source() for cleanup - Copy PostgreSQL SQL files to self_hosted_sql/gcp/ https://issues.redhat.com/browse/FLPATH-3323 Signed-off-by: Yoni Dayagi <ydayagi@redhat.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

koku-ci-triager-bot · 2026-05-19T12:20:37Z

🤖 CI Triager — Diagnosis

Check: Red Hat Konflux / koku-ci / koku
PipelineRun: koku-ci-6k9xx
Root cause: The deploy-application task timed out waiting for the ephemeral Clowder environment to become ready. Multiple dependent services (sources-api, rbac, puptoo) failed to start, and the Clowder environment was locked. This is a transient infrastructure issue unrelated to this PR's code changes.
Evidence:

Warning  ClowdEnvLocked   clowdapp/koku     Clowder Environment [env-ephemeral-zdz28l] is locked
Warning  ClowdAppNotReady clowdapp/koku     ClowdApp [koku] is not ready
Warning  BackOff          pod/sources-api-svc-5d45d8d498-vpfc6  Back-off restarting failed container

ERROR: deploy failed: timed out waiting for ClowdApp-owned resources

Action: Re-trigger the koku-ci check. The ephemeral environment infrastructure was unhealthy at the time this run executed.

Generated automatically. Review before applying.

koku-ci-triager-bot · 2026-05-19T12:20:46Z

🤖 CI Triager — Warning

Check: Migration convention
Root cause: This PR adds 3 migration files, but the Koku convention requires at most 1 migration per PR. Multiple migrations should be squashed into a single file before merging.
Evidence:

koku/reporting/migrations/0351_awslineitem_awslineitemdaily_and_more.py
koku/reporting/migrations/0352_azurelineitem_managedazureopenshiftdaily_and_more.py
koku/reporting/migrations/0353_gcplineitem_gcplineitemdaily_and_more.py

Action: Squash the migrations into a single file:

python koku/manage.py squashmigrations <app_label> <first_migration> <last_migration>

Replace the three migration files with the generated squashed migration and update the dependencies accordingly.

Generated automatically. Review before applying.

ydayagi requested review from a team as code owners March 11, 2026 10:35

github-actions Bot added the smokes-required Label to show that smokes tests should be run against these changes. label Mar 11, 2026

ydayagi changed the title ~~Trino2pggcp~~ [FLPATH-3327] Add GCP self-hosted/on-prem support Mar 11, 2026

ydayagi added gcp-smoke-tests pr_check will run gcp + ocp on gcp smoke tests, used when changes affect GCP only. on-prem-processing pr_check will deploy and run the on-prem data pipeline processing flow. flightpath-pr Issues being worked on by the flight path team labels Mar 11, 2026

ydayagi force-pushed the trino2pggcp branch 2 times, most recently from de827f4 to 5aae3eb Compare March 11, 2026 10:48

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

ydayagi force-pushed the trino2pggcp branch 2 times, most recently from 1e5ec1c to da2a4ac Compare March 11, 2026 19:46

myersCody added the on-hold label Mar 18, 2026

myersCody requested changes May 5, 2026

View reviewed changes

myersCody reviewed May 5, 2026

View reviewed changes

[FLPATH-3323] Add AWS self-hosted/on-prem support

493613a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ydayagi enabled auto-merge (squash) May 11, 2026 06:03

ydayagi force-pushed the trino2pggcp branch 2 times, most recently from 66a2c9c to 4c0bf43 Compare May 11, 2026 06:07

ydayagi force-pushed the trino2pggcp branch from 4c0bf43 to c953259 Compare May 11, 2026 06:43

ydayagi force-pushed the trino2pggcp branch from c953259 to a9b95d3 Compare May 11, 2026 08:09

	cost = models.FloatField(null=True)
	cost = models.DecimalField(max_digits=24, decimal_places=9, null=True)

		@@ -0,0 +1,154 @@
		CREATE TABLE IF NOT EXISTS {{schema \| sqlsafe}}.managed_aws_openshift_daily_temp

Conversation

ydayagi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myersCody May 5, 2026

Choose a reason for hiding this comment

Uh oh!

koku-ci-triager-bot commented May 19, 2026

Uh oh!

koku-ci-triager-bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ydayagi commented Mar 11, 2026 •

edited

Loading

codecov Bot commented Mar 11, 2026 •

edited

Loading