[FLPATH-3326] Add Azure self-hosted/on-prem support#5932
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the system by adding support for Azure self-hosted/on-prem deployments. It introduces necessary database models and migrations to handle Azure line items, following a similar approach to the existing AWS implementation. This expansion allows for more comprehensive cost management across different cloud providers in self-hosted environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds self-hosted PostgreSQL support for the Azure provider, mirroring the existing AWS implementation. This includes new Django models for Azure line items, migrations for partitioned tables, and extensive SQL scripts for data processing. The changes also involve a significant and beneficial refactoring of the ReportParquetProcessorBase to generalize the logic for writing to self-hosted tables, reducing code duplication across provider-specific processors. While the overall approach is solid, I've identified a critical bug in one of the new Azure SQL scripts that will prevent tag-based cost attribution from working correctly, and an inconsistency in tag matching logic compared to the AWS implementation. Addressing these issues will ensure the new functionality is robust and consistent.
Note: Security Review did not run due to the size of the PR.
cfd6f9f to
241ae0b
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5932 +/- ##
======================================
Coverage 94.4% 94.4%
======================================
Files 362 366 +4
Lines 31988 32583 +595
Branches 3513 3529 +16
======================================
+ Hits 30185 30756 +571
- Misses 1168 1190 +22
- Partials 635 637 +2 🚀 New features to boost your workflow:
|
|
/retest |
|
@dchorvat1 can you run our integration tests on these to confirm functionality, then move it out of draft. |
| @@ -0,0 +1,154 @@ | |||
| CREATE TABLE IF NOT EXISTS {{schema | sqlsafe}}.managed_aws_openshift_daily_temp | |||
There was a problem hiding this comment.
What is the migration strategy for these tables in the on premise flow? Cause it doesn't appear like we have one at all from my perspective.
|
Testing Instructions are required |
| # Delete from existing tables | ||
| total_deleted = 0 | ||
| for table_name in existing_tables: | ||
| delete_sql = get_report_db_accessor().get_delete_day_by_manifestid_sql( |
There was a problem hiding this comment.
The name of this method is terrible considering it deletes the entire month.
def get_delete_day_by_manifestid_sql(
self, schema_name: str, table_name: str, source: str, year: str, month: str, manifestid: str
):
"""Return the SQL to delete data where manifestid doesn't match."""
return f"""
DELETE FROM "{schema_name}"."{table_name}"
WHERE source = '{source}'
AND year = '{year}'
AND month = '{month}'
AND manifestid != '{manifestid}'
"""
| """Return list of table names to delete from. Override in subclass if needed.""" | ||
| return [self._table_name] | ||
|
|
||
| def delete_day_postgres(self, start_date, reportnumhours=None): |
There was a problem hiding this comment.
Why are we calling this delete day when we delete the entire month?
| """Return list of table names to delete from. Override in subclass if needed.""" | ||
| return [self._table_name] | ||
|
|
||
| def delete_day_postgres(self, start_date, reportnumhours=None): |
There was a problem hiding this comment.
You pass in start_date here but don't seem to use it anywhere?
| """Return list of table names to delete from. Override in subclass if needed.""" | ||
| return [self._table_name] | ||
|
|
||
| def delete_day_postgres(self, start_date, reportnumhours=None): |
There was a problem hiding this comment.
I highly recommend we follow the call chain for this
for csv_filename in file_list:
# set start date based on data in the file being processed:
if self.provider_type == Provider.PROVIDER_OCP:
self.start_date = self.ocp_files_to_process[csv_filename.stem]["meta_reportdatestart"]
self._delete_old_data(Path(csv_filename))
if self.provider_type == Provider.PROVIDER_OCP and self.report_type is None:
msg = "Unknown report type, skipping file processing"
LOG.warning(
log_json(
self.tracing_id,
msg=msg,
context=self.error_context,
filename=csv_filename,
)
)
return
Inside of _delete_old_data:
if settings.ONPREM:
self._delete_old_data_postgres(filename)
else:
self._delete_old_data_trino(filename)
def _delete_old_data_postgres(self, filename):
"""remove records with data older than the data in the file being processed"""
# Get reportnumhours for OCP (will be None for non-OCP)
reportnumhours = None
if self.ocp_files_to_process:
reportnumhours = int(self.ocp_files_to_process[filename.stem]["meta_reportnumhours"])
# Processor handles deleting from all relevant tables (raw and daily for OCP)
processor = self._get_report_processor(daily=False)
processor.delete_day_postgres(self.start_date, reportnumhours)
Are you deleting a whole month of data each time we process a csv?
e3fa6ae to
7930848
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add self-hosted PostgreSQL support for Azure provider, following the same pattern as AWS. Changes: - Add Django model for Azure line items (azure_line_items) - Add migration for partitioned Azure line item table - Add self_hosted_sql/azure/ directory with PostgreSQL-converted SQL files - Update Azure processor with _date_column, self_hosted_line_item_model - Update Azure db accessor to use get_sql_folder_name() - Add delete_self_hosted_data_by_source() for cleanup Jira: https://issues.redhat.com/browse/FLPATH-3323 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Yaron Dayagi <ydayagi@redhat.com>
|
🤖 CI Triager — Diagnosis Check: Red Hat Konflux / koku-ci / koku Action: Re-trigger the Generated automatically. Review before applying. |
|
🤖 CI Triager — Warning Check: Migration convention Action: Squash the migrations into a single file: Replace the two migration files with the generated squashed migration and update the dependencies accordingly. Generated automatically. Review before applying. |
[FLPATH-3326] Add Azure self-hosted/on-prem support