🧹 Centralize duplicated helper functions in alpha191/utils.py#19
🧹 Centralize duplicated helper functions in alpha191/utils.py#19Uwater1 wants to merge 1 commit into
Conversation
This commit centralizes the `get_alpha_func` and `get_stock_codes` helper functions that were duplicated across multiple test and assessment scripts. The new implementation in `alpha191/utils.py` is more robust, supporting different naming conventions (alphaXXX vs alpha_XXX) and providing unified error handling. Updated scripts: - speedtest.py - fulltest.py - calculate_covariance.py - select_alphas.py - run_alpha_tests.py - ICtest.py - grouptest.py Added unit tests for the new utility functions in tests/test_alphas.py. Co-authored-by: Uwater1 <194621470+Uwater1@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the codebase by centralizing commonly used helper functions into a dedicated utility module. This change streamlines the process of retrieving alpha functions and stock codes, eliminating duplicated logic across various analysis scripts. The primary impact is improved code consistency, easier maintenance, and enhanced flexibility for future development, as all scripts now rely on a single, robust implementation of these core utilities. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a great refactoring effort that centralizes duplicated helper functions, get_alpha_func and get_stock_codes, into alpha191/utils.py. This significantly improves code maintainability and readability by removing redundant code from multiple scripts. The new centralized functions are also more robust. I've added a couple of suggestions to further improve the new utility function and its tests for better error handling, clarity, and test reliability.
Note: Security Review did not run due to the size of the PR.
| def get_alpha_func(alpha_id: Union[int, str], use_df: bool = False, ignore_errors: bool = False) -> Optional[Any]: | ||
| """ | ||
| Get the alpha function by number or name. | ||
|
|
||
| Args: | ||
| alpha_id: Alpha number (e.g., 17) or name (e.g., "alpha017") | ||
| use_df: If True, return the function that takes a DataFrame (alpha_XXX). | ||
| If False, return the function that takes code/benchmark (alphaXXX). | ||
| ignore_errors: If True, return None on failure instead of raising ValueError. | ||
|
|
||
| Returns: | ||
| The alpha function or None if not found and ignore_errors is True. | ||
| """ | ||
| import importlib | ||
| try: | ||
| alpha_name = format_alpha_name(str(alpha_id)) | ||
| module = importlib.import_module(f"alpha191.{alpha_name}") | ||
|
|
||
| if use_df: | ||
| # Try alpha_XXX first (preferred for DataFrame input) | ||
| func_name = f"alpha_{int(alpha_name[5:]):03d}" | ||
| if hasattr(module, func_name): | ||
| return getattr(module, func_name) | ||
| # Fallback to alphaXXX | ||
| if hasattr(module, alpha_name): | ||
| return getattr(module, alpha_name) | ||
| else: | ||
| # Try alphaXXX first (preferred for code/benchmark input) | ||
| if hasattr(module, alpha_name): | ||
| return getattr(module, alpha_name) | ||
| # Fallback to alpha_XXX | ||
| func_name = f"alpha_{int(alpha_name[5:]):03d}" | ||
| if hasattr(module, func_name): | ||
| return getattr(module, func_name) | ||
|
|
||
| except (ImportError, ModuleNotFoundError, ValueError): | ||
| pass | ||
|
|
||
| if ignore_errors: | ||
| return None | ||
| raise ValueError(f"Alpha function for '{alpha_id}' not found") |
There was a problem hiding this comment.
This function can be improved for clarity, robustness, and to follow best practices.
- Local Import:
import importlibis inside the function. According to PEP 8, imports should be at the top of the file unless there's a specific reason like avoiding circular dependencies. Please moveimport importlibto the top ofalpha191/utils.py. - DRY Principle: The logic for finding the function when
use_dfisTrueorFalseis very similar and can be refactored to avoid repetition. You can define the search order based on theuse_dfflag and then iterate through it. - Error Handling: The
except (ImportError, ModuleNotFoundError, ValueError): passblock is too broad. It catchesValueErrorfromformat_alpha_namebut then raises a generic "not found" error, hiding the original, more specific error. It's better to handle exceptions more granularly to provide clearer error messages.
Here is a suggested refactoring that addresses points 2 and 3 (please also address point 1 separately by moving the import):
def get_alpha_func(alpha_id: Union[int, str], use_df: bool = False, ignore_errors: bool = False) -> Optional[Any]:
"""
Get the alpha function by number or name.
Args:
alpha_id: Alpha number (e.g., 17) or name (e.g., "alpha017")
use_df: If True, return the function that takes a DataFrame (alpha_XXX).
If False, return the function that takes code/benchmark (alphaXXX).
ignore_errors: If True, return None on failure instead of raising ValueError.
Returns:
The alpha function or None if not found and ignore_errors is True.
"""
try:
alpha_name = format_alpha_name(str(alpha_id))
except ValueError as e:
if ignore_errors:
return None
raise ValueError(f"Invalid alpha name format for '{alpha_id}'") from e
try:
module = importlib.import_module(f"alpha191.{alpha_name}")
except (ImportError, ModuleNotFoundError) as e:
if ignore_errors:
return None
raise ValueError(f"Could not import module for alpha '{alpha_id}'") from e
df_func_name = f"alpha_{int(alpha_name[5:]):03d}"
code_func_name = alpha_name
# Determine search order based on use_df flag
search_order = [df_func_name, code_func_name] if use_df else [code_func_name, df_func_name]
for func_name in search_order:
if hasattr(module, func_name):
return getattr(module, func_name)
if ignore_errors:
return None
raise ValueError(f"Alpha function for '{alpha_id}' not found in module '{alpha_name}'")| def test_get_stock_codes(self): | ||
| """Test get_stock_codes lists files correctly.""" | ||
| from alpha191.utils import get_stock_codes | ||
| from pathlib import Path | ||
|
|
||
| # This test depends on the existence of bao/hs300 directory | ||
| if Path("bao/hs300").exists(): | ||
| codes = get_stock_codes("hs300") | ||
| self.assertIsInstance(codes, list) | ||
| if len(codes) > 0: | ||
| self.assertTrue(isinstance(codes[0], str)) | ||
| else: | ||
| with self.assertRaises(FileNotFoundError): | ||
| get_stock_codes("nonexistent_benchmark") |
There was a problem hiding this comment.
This test depends on the actual file system state (the existence of the bao/hs300 directory), which makes it brittle and not fully reliable in different environments. A better practice for unit tests is to isolate them from external dependencies like the file system by using mocking.
You can use unittest.mock.patch to mock pathlib.Path and its methods (exists, glob) to simulate the presence or absence of directories and files. This will make your test deterministic and more robust, ensuring it covers both success and failure cases regardless of the environment.
For example, you could create separate test methods for the found and not-found cases using mocking:
from unittest.mock import patch, MagicMock
# ... inside your test class ...
@patch('alpha191.utils.PROJECT_ROOT')
def test_get_stock_codes_found(self, mock_project_root):
# ... setup mock for existing directory with files ...
# ... assert correct codes are returned ...
@patch('alpha191.utils.PROJECT_ROOT')
def test_get_stock_codes_not_found(self, mock_project_root):
# ... setup mock for non-existent directory ...
# ... assert FileNotFoundError is raised ...
Consolidated duplicated
get_alpha_funcandget_stock_codeshelper functions from several scripts intoalpha191/utils.py. The centralized functions are more robust and flexible, accommodating the different needs of various scripts (e.g., DataFrame vs code/benchmark input, varying error handling). This improvement enhances the maintainability and readability of the codebase.PR created automatically by Jules for task 7650666659604456366 started by @Uwater1