Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes introduce support for specifying multiple CMIP6 models for data download, add a configurable Google Cloud Storage bucket parameter, and switch the Earth Engine data extraction from direct fetch to an export-to-bucket workflow. Associated configuration files, parameter schemas, and environment variables are updated to reflect these enhancements and improve flexibility. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Tool
participant EarthEngine
participant GCS as GoogleCloudStorage
User->>Tool: Initiate download_cmip6_series with models and bucket
Tool->>EarthEngine: Request data extraction for each model
EarthEngine->>GCS: Export feature collection as CSV to bucket
Tool->>GCS: Poll for export completion
GCS-->>Tool: Notify on export completion
Tool->>GCS: Download CSV file
Tool->>GCS: Delete CSV blob after download
Tool-->>User: Return DataFrame with results
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)src/earthengine.py📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (8)
src/download.py (3)
1-1: Drop the unusedtimedeltaimport
Keeping unused symbols hurts linting and build hygiene.-from datetime import datetime, timedelta +from datetime import datetime
7-7: Remove unused imports (ParamsCMIP6,EE_CMIP6_MODELS)
They are never referenced and trigger F401.-from params import ParamsCMIP6, map_dataset, map_variable, EE_CMIP6_MODELS, EE_CMIP6_MODELS_SHORT +from params import map_dataset, map_variable, EE_CMIP6_MODELS_SHORT
131-131: Drop the redundant f-string
No placeholders → plain string is cheaper and silences Ruff F541.-print(f"Earth Engine export task started. Check your Earth Engine Tasks tab (https://code.earthengine.google.com/tasks)") +print("Earth Engine export task started. Check your Earth Engine Tasks tab (https://code.earthengine.google.com/tasks)")docker-compose.yml (1)
13-15: Terminate file with a newline
Fails YAML-lintnew-line-at-end-of-file; add a trailing LF.src/carlos-download-optimized CMIP6 (4)
38-43: Do notpip installstd-lib modules
argparseships with Python; installing it is a no-op at best and a foot-gun at worst. Remove the call.
79-82: Unused importget_parameter
It’s imported but never used. Drop it to keep the script clean.
95-97: Shadowing theeemodule
ee = ee_auth_initialize()re-assigns the imported module; harmless but unnecessary. Simply callee_auth_initialize()without capturing its return.
125-136: Write pivoted CSV without the datetime index
pivot.sort_index().to_csv()will write the index twice (as unnamed + column). Useindex=Falseor reset the index first to get a clean CSV.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
src/all_precip_ssp585.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
docker-compose.yml(1 hunks)src/carlos-download-optimized CMIP6(1 hunks)src/download.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/download.py (2)
src/params.py (3)
ParamsCMIP6(12-14)map_dataset(59-75)map_variable(38-57)src/credentials.py (1)
build_ee_credentials(17-30)
🪛 YAMLlint (1.37.1)
docker-compose.yml
[error] 15-15: no new line character at the end of file
(new-line-at-end-of-file)
🪛 Ruff (0.11.9)
src/download.py
1-1: datetime.timedelta imported but unused
Remove unused import: datetime.timedelta
(F401)
7-7: params.ParamsCMIP6 imported but unused
Remove unused import
(F401)
7-7: params.EE_CMIP6_MODELS imported but unused
Remove unused import
(F401)
25-28: Use ternary operator end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date instead of if-else-block
Replace if-else-block with end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date
(SIM108)
131-131: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/download.py
Outdated
| except Exception as e: | ||
| print(f"Direct data fetch failed: {e}") | ||
| print("This usually happens when the dataset is too large to download directly via getInfo().") | ||
| print(f"Proceeding with export to Google Cloud Storage bucket: '{GCS_BUCKET_NAME}'...") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid swallowing all exceptions
Catching Exception hides programming errors and complicates debugging. Consider catching ee.EEException / MemoryError only, or re-raising unexpected ones.
🤖 Prompt for AI Agents
In src/download.py around lines 110 to 114, the current code catches all
exceptions using a broad Exception clause, which can hide programming errors and
make debugging harder. Modify the except block to catch only specific exceptions
like ee.EEException and MemoryError. For any other unexpected exceptions,
re-raise them to avoid swallowing errors silently.
src/download.py
Outdated
| start_date = kwargs.start_date | ||
| if kwargs.end_date is None: | ||
| end_date = datetime.now() | ||
| else: | ||
| end_date = kwargs.end_date |
There was a problem hiding this comment.
Parse start_date / end_date defensively
json2args may deliver strings; calling .strftime on a str will raise.
-start_date = kwargs.start_date
-if kwargs.end_date is None:
- end_date = datetime.now()
-else:
- end_date = kwargs.end_date
+start_date = kwargs.start_date if isinstance(kwargs.start_date, datetime) \
+ else datetime.fromisoformat(kwargs.start_date)
+end_date = datetime.now() if kwargs.end_date in (None, 'None') \
+ else datetime.fromisoformat(kwargs.end_date)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.11.9)
25-28: Use ternary operator end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date instead of if-else-block
Replace if-else-block with end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date
(SIM108)
🤖 Prompt for AI Agents
In src/download.py around lines 24 to 28, the start_date and end_date values
from kwargs may be strings, but the code assumes they are datetime objects and
calls datetime methods on them. To fix this, add checks to detect if start_date
or end_date are strings and parse them into datetime objects before using them.
This defensive parsing will prevent errors when these values come as strings
from json2args.
src/carlos-download-optimized CMIP6
Outdated
| @@ -0,0 +1,152 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid spaces in filenames
src/carlos-download-optimized CMIP6 complicates imports & tooling. Rename to snake_case (e.g. carlos_download_optimized_cmip6.py).
🤖 Prompt for AI Agents
In the file named "src/carlos-download-optimized CMIP6" at line 1, the filename
contains spaces which complicate imports and tooling. Rename the file to use
snake_case without spaces, for example, "carlos_download_optimized_cmip6.py",
and update any references or imports accordingly to reflect the new filename.
src/carlos-download-optimized CMIP6
Outdated
| def ee_auth_initialize(): | ||
| import ee | ||
| try: | ||
| ee.Initialize() | ||
| except Exception: | ||
| print("[INFO] Authenticating Earth Engine...") | ||
| ee.Authenticate(auth_mode='notebook') | ||
| ee.Initialize() |
There was a problem hiding this comment.
Interactive ee.Authenticate will block in headless Docker
Instead, reuse the service-account logic already present (build_ee_credentials) to ensure non-interactive auth.
🤖 Prompt for AI Agents
In src/carlos-download-optimized CMIP6 around lines 64 to 71, the current
ee_auth_initialize function uses ee.Authenticate interactively, which will block
in a headless Docker environment. To fix this, replace the interactive
authentication call with the existing service-account authentication logic from
build_ee_credentials to perform non-interactive authentication. Modify the
function to initialize Earth Engine using the service account credentials
without prompting for user input.
|
@mmaelicke changes made today with you and also in tool.yml with params.py info |
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/earthengine.py (1)
183-198: Add timeout and better error handling for export task.The export polling loop could run indefinitely if the task gets stuck. Consider adding a timeout and more detailed error information.
+ max_wait_time = 3600 # 1 hour timeout + start_time = time.time() + # Wait for the export to complete while task.active(): + if time.time() - start_time > max_wait_time: + task.cancel() + raise RuntimeError(f"Export task timed out after {max_wait_time} seconds") logger.debug("Exporting to Cloud Storage...") time.sleep(30) # Check every 30 seconds if task.status()['state'] == 'COMPLETED': logger.info(f"Export completed successfully! Check the '{kwargs.bucket}' for {file_name}.csv") blob = bucket.blob(f"{file_name}.csv") with blob.open() as f: df = pd.read_csv(f) blob.delete() return df else: - logger.error(f"Export failed: {task.status()}") + error_message = task.status().get('error_message', 'Unknown error') + logger.error(f"Export failed with state: {task.status()['state']}, error: {error_message}") + raise RuntimeError(f"Earth Engine export failed: {error_message}")
🧹 Nitpick comments (2)
src/params.py (1)
15-15: Required bucket parameter for GCS export.The addition of the
bucketparameter is essential for the new Google Cloud Storage export workflow. Consider adding validation to ensure the bucket name follows GCS naming conventions.Consider adding a validator for the bucket name:
from pydantic import field_validator class ParamsCMIP6(Params): model: list[str] scenario: str bucket: str @field_validator('bucket') def validate_bucket_name(cls, v): # GCS bucket naming rules if not (3 <= len(v) <= 63): raise ValueError("Bucket name must be between 3 and 63 characters") if not v.replace('-', '').replace('_', '').replace('.', '').isalnum(): raise ValueError("Bucket name can only contain letters, numbers, hyphens, underscores, and dots") return vsrc/earthengine.py (1)
86-89: Simplify conditional assignment.Use a ternary operator for cleaner code.
- if kwargs.end_date is None: - end_date = datetime.now() - else: - end_date = kwargs.end_date + end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
in/inputs.json(1 hunks)src/credentials.py(2 hunks)src/earthengine.py(3 hunks)src/params.py(2 hunks)src/tool.yml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- in/inputs.json
🧰 Additional context used
🪛 YAMLlint (1.37.1)
src/tool.yml
[warning] 103-103: too many spaces after comma
(commas)
🪛 Ruff (0.11.9)
src/earthengine.py
3-3: tempfile imported but unused
Remove unused import: tempfile
(F401)
11-11: params.EE_CMIP6_MODELS imported but unused
Remove unused import
(F401)
11-11: params.EE_CMIP6_MODELS_SHORT imported but unused
Remove unused import
(F401)
86-89: Use ternary operator end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date instead of if-else-block
Replace if-else-block with end_date = datetime.now() if kwargs.end_date is None else kwargs.end_date
(SIM108)
🪛 Flake8 (7.2.0)
src/earthengine.py
[error] 3-3: 'tempfile' imported but unused
(F401)
[error] 11-11: 'params.EE_CMIP6_MODELS' imported but unused
(F401)
[error] 11-11: 'params.EE_CMIP6_MODELS_SHORT' imported but unused
(F401)
[error] 117-117: at least two spaces before inline comment
(E261)
[error] 121-121: at least two spaces before inline comment
(E261)
[error] 127-127: at least two spaces before inline comment
(E261)
🪛 Pylint (3.3.7)
src/earthengine.py
[error] 8-8: No name 'storage' in module 'google.cloud'
(E0611)
[error] 11-11: No name 'EE_CMIP6_MODELS_SHORT' in module 'params'
(E0611)
🔇 Additional comments (4)
src/credentials.py (1)
2-2: LGTM! Proper environment variable setup for Google credentials.The addition of
os.environ.setdefault()ensures the Earth Engine SDK can locate the service account credentials file without requiring explicit authentication in downstream code. Usingsetdefaultrespects any existing environment variable while providing a sensible default.Also applies to: 25-25
src/params.py (1)
13-13: Good change to support multiple model selection.Changing the model type to
list[str]aligns with the tool configuration update and enables users to download data for multiple CMIP6 models in a single request.src/tool.yml (1)
66-102: Well-structured parameter updates for multi-model support.The conversion of the model parameter to support multiple selections and the addition of the bucket parameter properly integrate with the refactored Earth Engine export workflow.
Also applies to: 112-115
src/earthengine.py (1)
105-129: Excellent refactoring of the data extraction logic.The new extraction function properly handles:
- Individual model/date combinations as separate features
- Proper use of the image's native projection and scale
- Null value handling with a sentinel value (-9999)
- Clear feature properties for downstream processing
This approach is much more scalable than the previous combined model approach.
| - NorESM2-MM | ||
| - TaiESM1 | ||
| - UKESM1-0-LL | ||
| default: ["ACCESS-CM2", "MPI-ESM1-2-HR", "EC-Earth3", "FGOALS-g3", "GISS-E2-1-G", "IPSL-CM6A-LR"] |
There was a problem hiding this comment.
Fix formatting: extra spaces after comma.
Remove the extra spaces after the comma to comply with YAML formatting standards.
- default: ["ACCESS-CM2", "MPI-ESM1-2-HR", "EC-Earth3", "FGOALS-g3", "GISS-E2-1-G", "IPSL-CM6A-LR"]
+ default: ["ACCESS-CM2", "MPI-ESM1-2-HR", "EC-Earth3", "FGOALS-g3", "GISS-E2-1-G", "IPSL-CM6A-LR"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default: ["ACCESS-CM2", "MPI-ESM1-2-HR", "EC-Earth3", "FGOALS-g3", "GISS-E2-1-G", "IPSL-CM6A-LR"] | |
| default: ["ACCESS-CM2", "MPI-ESM1-2-HR", "EC-Earth3", "FGOALS-g3", "GISS-E2-1-G", "IPSL-CM6A-LR"] |
🧰 Tools
🪛 YAMLlint (1.37.1)
[warning] 103-103: too many spaces after comma
(commas)
🤖 Prompt for AI Agents
In src/tool.yml at line 103, the list of default values contains extra spaces
after commas. Remove these extra spaces so that the items are separated only by
commas without spaces, ensuring proper YAML formatting.
|
|
||
| # Reduce region to get the value at the point | ||
| value = image.reduceRegion( | ||
| reducer=ee.Reducer.first(), # Use first() for single point extraction |
There was a problem hiding this comment.
Fix inline comment formatting.
Add proper spacing before inline comments to comply with PEP 8.
- reducer=ee.Reducer.first(), # Use first() for single point extraction
+ reducer=ee.Reducer.first(), # Use first() for single point extraction
geometry=point,
scale=scale,
- crs=image.projection()
- ).get(band_name) # Get the value for the specific variable band
+ crs=image.projection()
+ ).get(band_name) # Get the value for the specific variable band
# Return a feature with date, model, and the extracted value
return ee.Feature(None, {
'date': date,
'model': model,
- 'value': ee.Algorithms.If(value, value, -9999) # Use -9999 for null values
+ 'value': ee.Algorithms.If(value, value, -9999) # Use -9999 for null values
})Also applies to: 121-121, 127-127
🧰 Tools
🪛 Flake8 (7.2.0)
[error] 117-117: at least two spaces before inline comment
(E261)
🤖 Prompt for AI Agents
In src/earthengine.py at lines 117, 121, and 127, the inline comments lack
proper spacing before the '#' symbol. To fix this, add a single space before
each inline comment to comply with PEP 8 formatting guidelines.
src/earthengine.py
Outdated
| from uuid import uuid4 | ||
| import tempfile |
There was a problem hiding this comment.
Remove unused imports.
Clean up the unused imports flagged by static analysis tools.
from pathlib import Path
from uuid import uuid4
-import tempfile
from datetime import datetime, timedelta
import pandas as pd
import ee
from json2args.logger import logger
from google.cloud import storage
import time
-from params import Params, ParamsCMIP6, map_dataset, map_variable, EE_CMIP6_MODELS, EE_CMIP6_MODELS_SHORT
+from params import Params, ParamsCMIP6, map_dataset, map_variableAlso applies to: 11-11
🧰 Tools
🪛 Ruff (0.11.9)
3-3: tempfile imported but unused
Remove unused import: tempfile
(F401)
🪛 Flake8 (7.2.0)
[error] 3-3: 'tempfile' imported but unused
(F401)
🤖 Prompt for AI Agents
In src/earthengine.py at lines 2 and 3, the imports 'uuid4' from 'uuid' and
'tempfile' are unused. Remove these import statements to clean up the code and
resolve static analysis warnings.
| # try: | ||
| # logger.info("Attempting to fetch data directly (might be slow/fail for large datasets)...") | ||
| # data_list = features_to_export.getInfo()['features'] | ||
|
|
||
| # parsed_data = [] | ||
| # for f in data_list: | ||
| # properties = f['properties'] | ||
| # parsed_data.append({ | ||
| # 'date': properties['date'], | ||
| # 'model': properties['model'], | ||
| # 'value': properties['value'] | ||
| # }) | ||
|
|
||
| # # Create a Pandas DataFrame | ||
| # df = pd.DataFrame(parsed_data) | ||
|
|
||
| # # Pivot the DataFrame to have models as columns | ||
| # df_pivot = df.pivot_table(index='date', columns='model', values='value') | ||
|
|
||
| # # Rename columns for clarity (e.g., 'tas_GFDL_ESM4') | ||
| # df_pivot.columns = [f"{kwargs.variable}_{col.replace('-', '_')}" for col in df_pivot.columns] | ||
|
|
||
| # # Reset index to make 'date' a regular column and sort by date | ||
| # df_pivot = df_pivot.reset_index() | ||
| # df_pivot['date'] = pd.to_datetime(df_pivot['date']) | ||
| # df_pivot = df_pivot.sort_values(by='date') | ||
|
|
||
| # return df_pivot | ||
|
|
||
| # except Exception as e: | ||
| # logger.info(FAIL_MESSAGE.format(e=e, bucket=kwargs.bucket)) | ||
| # pass | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Remove or document the commented code block.
Large blocks of commented code should be removed or properly documented if they serve as a reference. If this code represents an alternative approach that might be useful in the future, consider moving it to documentation or creating a separate method with a clear deprecation notice.
🤖 Prompt for AI Agents
In src/earthengine.py around lines 133 to 165, there is a large block of
commented-out code that should either be removed or properly documented. If this
code is an alternative approach that might be useful later, move it to a
separate method with a clear deprecation or usage note, or add detailed comments
explaining its purpose. Otherwise, remove the commented code to keep the
codebase clean.
Pushed my branch with changes in file 'download' and added a file called "carlos-download-optimized CMIP6" wich contains the steps i made to download de data in the docker container, installing ipython, dependencies check for ee-api, pandas, authentication, and others.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
.DS_Storefiles are ignored by version control.Chores