COde Duplication FIxed!!#74
Conversation
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Reviewer's GuideIntroduces a TTL-aware LRU cache and resolver/session lifecycle utilities to TrueLinkResolver, centralizes common error-raising helpers in BaseResolver to remove duplicated methods from individual resolvers, refactors HTTP file-metadata fetching for clarity, and adds a lightweight script to manually exercise the new behaviors. Class diagram for TrueLinkResolver caching and resolver lifecycleclassDiagram
class _CacheEntry {
+value LinkResult|FolderResult
+timestamp float
__init__(value LinkResult|FolderResult)
}
class _LRUCache {
+max_size int
+ttl int
-_cache OrderedDict~str,_CacheEntry~
+__init__(max_size int, ttl int)
+get(key str) LinkResult|FolderResult|None
+set(key str, value LinkResult|FolderResult) None
+clear() None
+cleanup_expired() int
}
class TrueLinkResolver {
+timeout int
+max_retries int
+proxy str|None
-_cache_max_size int
-_cache_ttl int
+_resolvers dict~str,type~
+_resolver_instances dict~str,object~
+_cache _LRUCache
+__init__(timeout int, max_retries int, proxy str|None, cache_max_size int, cache_ttl int)
+resolve(url str, use_cache bool) LinkResult|FolderResult
+clear_cache() None
+cleanup_cache() int
+is_supported(url str) bool
+get_supported_domains() list~str~
+cleanup_resolver_instances() None
+register_resolver(domain_pattern str, resolver_cls type) None
+_get_resolver(url str) object
}
TrueLinkResolver --> _LRUCache : uses
_LRUCache o--> _CacheEntry : contains
Class diagram for BaseResolver error helpers and fetch refactorclassDiagram
class BaseResolver {
-session aiohttp.ClientSession|None
+__aenter__() BaseResolver
+__aexit__(exc_type type|None, exc_val BaseException|None, exc_tb TracebackType|None) bool
+_create_session() None
+_close_session() None
+_fetch_file_details(url str, headers dict~str,str~|None) tuple~str|None,int|None,str|None~
+_extract_filename(content_disposition str) str|None
+_get_filename_from_url(url str) str
+_raise_extraction_failed(msg str) None
+_raise_invalid_url(msg str) None
}
class MediafireResolver {
+resolve(url str) LinkResult|FolderResult
+collect_files(folder_key str, path_prefix str) None
}
BaseResolver <|-- MediafireResolver : inherits
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Code Review
This pull request introduces a robust LRU cache with TTL support to the TrueLinkResolver, replacing the previous basic dictionary-based caching mechanism. It also centralizes exception-raising helpers in the BaseResolver, leading to significant code cleanup across multiple resolver implementations. Additionally, new methods for cache management and resolver instance cleanup have been added. Feedback focuses on two main areas: first, the constructor parameters for cache configuration are currently ignored because they do not affect the shared class-level cache instance; second, the session cleanup logic relies on deprecated event loop methods and should be refactored into an asynchronous method for better reliability.
There was a problem hiding this comment.
Hey - I've found 6 issues, and left some high level feedback:
- The new
cache_max_sizeandcache_ttlparameters onTrueLinkResolverare stored on the instance but never applied to the class-level_cache, so they currently have no effect on cache behavior; consider wiring these through to construct/configure the_LRUCacheor making the cache instance-scoped instead of aClassVar. - The new
cleanup_resolver_instancesmethod mixes synchronous and asynchronous event-loop control (get_event_loop,run_until_complete,create_task) and swallows all exceptions, which can behave unpredictably in different runtime contexts; consider requiring an event loop to be passed in or making it fully async and letting callers decide how to schedule it, and avoid catching bareException. - The
test_changes.pyscript at the repo root looks like an ad-hoc manual test harness (e.g., printing to stdout, touching private attributes) rather than a reusable test; consider removing it from the PR or converting the relevant checks into proper automated tests under the existing test framework.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `cache_max_size` and `cache_ttl` parameters on `TrueLinkResolver` are stored on the instance but never applied to the class-level `_cache`, so they currently have no effect on cache behavior; consider wiring these through to construct/configure the `_LRUCache` or making the cache instance-scoped instead of a `ClassVar`.
- The new `cleanup_resolver_instances` method mixes synchronous and asynchronous event-loop control (`get_event_loop`, `run_until_complete`, `create_task`) and swallows all exceptions, which can behave unpredictably in different runtime contexts; consider requiring an event loop to be passed in or making it fully async and letting callers decide how to schedule it, and avoid catching bare `Exception`.
- The `test_changes.py` script at the repo root looks like an ad-hoc manual test harness (e.g., printing to stdout, touching private attributes) rather than a reusable test; consider removing it from the PR or converting the relevant checks into proper automated tests under the existing test framework.
## Individual Comments
### Comment 1
<location path="src/truelink/core.py" line_range="116" />
<code_context>
_resolvers: ClassVar[dict[str, type]] = {}
_resolver_instances: ClassVar[dict[str, object]] = {}
+ _cache: ClassVar[_LRUCache] = _LRUCache(max_size=1000, ttl=3600)
def __init__(
</code_context>
<issue_to_address>
**issue (bug_risk):** Constructor cache parameters are stored but never applied to the shared LRU cache instance.
`TrueLinkResolver.__init__` accepts `cache_max_size` and `cache_ttl` and stores them on the instance, but the actual cache is the class-level `_cache = _LRUCache(max_size=1000, ttl=3600)`, which ignores these values. As a result, changing the constructor arguments doesn’t alter caching behavior. Either wire these parameters into how `_cache` is created (e.g., lazy init using the first instance’s settings), make the cache per-instance, or remove the unused parameters to avoid misleading API surface.
</issue_to_address>
### Comment 2
<location path="src/truelink/core.py" line_range="290-299" />
<code_context>
+ return list(cls._resolvers.keys())
+
+ @classmethod
+ def cleanup_resolver_instances(cls) -> None:
+ """Clean up all resolver instances and close their sessions."""
+ for instance in cls._resolver_instances.values():
+ if hasattr(instance, "session") and instance.session:
+ import asyncio
+
+ try:
+ loop = asyncio.get_event_loop()
+ if loop.is_running():
+ asyncio.create_task(instance.session.close())
+ else:
+ loop.run_until_complete(instance.session.close())
+ except Exception:
+ pass
+ cls._resolver_instances.clear()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The resolver cleanup logic around asyncio event loops is fragile and can silently skip closing sessions.
In `cleanup_resolver_instances`, `asyncio.get_event_loop()` can raise `RuntimeError` when no default loop exists (e.g. worker threads, newer asyncio usage), and the blanket `except Exception: pass` will silently skip closing sessions. Consider using `asyncio.get_running_loop()` when a loop is active and a safe fallback like `asyncio.run(instance.session.close())` in sync contexts, and tightening the `except` to only handle expected asyncio/loop errors so real issues aren’t hidden and sessions are reliably closed.
</issue_to_address>
### Comment 3
<location path="test_changes.py" line_range="1-10" />
<code_context>
+#!/usr/bin/env python3
</code_context>
<issue_to_address>
**issue (testing):** Convert this ad‑hoc script into proper automated tests runnable by the test suite
This file is currently a manual verification script (`print` + `__main__` + `sys.exit`), so it won’t run under pytest/unittest or in CI. To integrate it into your automated test suite:
- Replace `print`/`sys.exit` with assertions in proper test functions.
- Drop the `if __name__ == "__main__"` runner and rely on test discovery.
- Move/rename it to match your standard test layout (e.g., under `tests/`).
That way the cache, resolver cleanup, and base methods are exercised on every CI run instead of only when run manually.
</issue_to_address>
### Comment 4
<location path="test_changes.py" line_range="13-22" />
<code_context>
+def test_cache():
</code_context>
<issue_to_address>
**suggestion (testing):** Replace print-based checks in `test_cache` with assertions that validate behavior
Right now `test_cache` only prints cache details, so it will still pass even if the cache is broken. Please replace the prints with assertions, for example:
- Type: `assert isinstance(resolver._cache, _LRUCache)` (import `_LRUCache` if appropriate)
- Config: `assert resolver._cache.max_size == 10` and `assert resolver._cache.ttl == 60`
- Behavior:
- After `TrueLinkResolver.clear_cache()`, cached values for a URL are no longer returned
- Inserting more than `max_size` entries evicts the oldest (LRU behavior)
- After simulating TTL expiry (via time mocking), entries are not returned and `cleanup_cache()` reports the correct number removed
This will ensure the test actually verifies the cache behavior instead of just exercising the code path.
Suggested implementation:
```python
import importlib
from truelink import TrueLinkResolver, _LRUCache
```
```python
def test_cache(monkeypatch):
"""Test cache functionality."""
# Initialize resolver with known cache settings
resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)
# 1. Cache type and configuration
assert isinstance(resolver._cache, _LRUCache)
assert resolver._cache.max_size == 10
assert resolver._cache.ttl == 60
# 2. Behavior: clear_cache() removes cached values
url = "https://example.com/clear-cache"
result_1 = resolver.resolve(url)
# Ensure the value is cached (cache should contain the URL)
assert url in resolver._cache
TrueLinkResolver.clear_cache()
# Cache should be empty after clear
assert url not in resolver._cache
assert len(resolver._cache) == 0
# 3. Behavior: LRU eviction when exceeding max_size
# Fill the cache with more than max_size distinct entries
base_url = "https://example.com/lru-"
for i in range(11):
resolver.resolve(f"{base_url}{i}")
# Cache should not exceed max_size
assert len(resolver._cache) == 10
# Oldest entry (LRU) should be evicted
assert f"{base_url}0" not in resolver._cache
# Newest entry should be present
assert f"{base_url}10" in resolver._cache
# 4. Behavior: TTL expiry and cleanup_cache()
# Dynamically locate the module where _LRUCache (and its time usage) is defined
cache_module = importlib.import_module(_LRUCache.__module__)
# Some implementations may use time.time, others time.monotonic; support both.
# We prefer monotonic if present, otherwise fall back to time.
if hasattr(cache_module, "time") and hasattr(cache_module.time, "monotonic"):
current_time = [cache_module.time.monotonic()]
def fake_monotonic():
return current_time[0]
monkeypatch.setattr(cache_module.time, "monotonic", fake_monotonic, raising=False)
elif hasattr(cache_module, "time") and hasattr(cache_module.time, "time"):
current_time = [cache_module.time.time()]
def fake_time():
return current_time[0]
monkeypatch.setattr(cache_module.time, "time", fake_time, raising=False)
else:
# If the cache implementation does not use a standard time module,
# this test would need to be adjusted to its specific design.
# Fail explicitly so the behavior is not silently untested.
raise RuntimeError("Cache module does not expose a supported time function for TTL testing")
# Recreate resolver to ensure we start with a fresh cache using the patched time
TrueLinkResolver.clear_cache()
resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)
ttl_url = "https://example.com/ttl"
resolver.resolve(ttl_url)
assert ttl_url in resolver._cache
# Advance time beyond the TTL to simulate expiry
current_time[0] += 61 # > cache_ttl (60)
# cleanup_cache should remove expired entries and report how many were removed
removed = TrueLinkResolver.cleanup_cache()
assert removed >= 1
# After cleanup, the expired entry must no longer be present
assert ttl_url not in resolver._cache
```
If other tests rely on `test_cache` having the old signature (without `monkeypatch`), they should be updated to not call `test_cache` directly or to pass the `monkeypatch` fixture appropriately (pytest will inject it automatically when running the test normally). Also, if the actual cache implementation uses a different timing mechanism than `cache_module.time.monotonic` or `cache_module.time.time`, the TTL-testing part of this test should be adapted to monkeypatch the correct function.
</issue_to_address>
### Comment 5
<location path="test_changes.py" line_range="19-28" />
<code_context>
+ print("Testing Cache Management")
+ print("=" * 50)
+
+ resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)
+
+ # Test cache type
+ print(f"\n1. Cache type: {type(resolver._cache).__name__}")
+ print(f" Expected: _LRUCache")
+
+ # Test cache attributes
+ print(f"\n2. Cache attributes:")
+ print(f" - Has max_size: {hasattr(resolver._cache, 'max_size')}")
+ print(f" - Has ttl: {hasattr(resolver._cache, 'ttl')}")
+ print(f" - Has get method: {hasattr(resolver._cache, 'get')}")
+ print(f" - Has set method: {hasattr(resolver._cache, 'set')}")
+ print(f" - Has clear method: {hasattr(resolver._cache, 'clear')}")
+ print(f" - Has cleanup_expired method: {hasattr(resolver._cache, 'cleanup_expired')}")
+
+ if hasattr(resolver._cache, 'max_size'):
+ print(f"\n3. Cache configuration:")
+ print(f" - max_size: {resolver._cache.max_size}")
</code_context>
<issue_to_address>
**suggestion (testing):** Avoid testing via private attributes and focus on public behavior of the caching API
This test reaches into `resolver._cache`, which is a private implementation detail and makes the test brittle against internal refactors. Instead, prefer exercising caching via the public API:
- Call `resolve(..., use_cache=True)` twice and assert the second call uses the cache (e.g., via a mock or call counter).
- Use `TrueLinkResolver.clear_cache()` / `cleanup_cache()` and assert observable behavior rather than inspecting `.max_size`/`.ttl`.
If you need to verify configuration, assert it through constructor arguments and resulting behavior, not by directly accessing `_cache`.
</issue_to_address>
### Comment 6
<location path="test_changes.py" line_range="57-66" />
<code_context>
+ TrueLinkResolver.cleanup_resolver_instances()
+ print(" - cleanup_resolver_instances() executed successfully")
+
+def test_base_methods():
+ """Test BaseResolver methods."""
+ print("\n" + "=" * 50)
+ print("Testing BaseResolver Methods")
+ print("=" * 50)
+
+ from truelink.resolvers.base import BaseResolver
+
+ print(f"\n1. BaseResolver has _raise_extraction_failed: {hasattr(BaseResolver, '_raise_extraction_failed')}")
+ print(f"2. BaseResolver has _raise_invalid_url: {hasattr(BaseResolver, '_raise_invalid_url')}")
+
+def test_other_methods():
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen `test_base_methods` by asserting the new helper methods actually raise the expected exceptions
Right now this test only prints whether the helpers exist via `hasattr`, so it doesn’t validate their behavior. Please turn this into an assertion-based test (e.g. `with pytest.raises(ExtractionFailedException): BaseResolver()._raise_extraction_failed("msg")` and similarly for `InvalidURLException`), and optionally assert on the message text so we’ll catch any future changes that break these helpers’ semantics, not just their presence.
Suggested implementation:
```python
def test_base_methods():
"""Test BaseResolver helper methods raise the expected exceptions."""
import pytest
from truelink.resolvers.base import BaseResolver
from truelink.exceptions import ExtractionFailedException, InvalidURLException
resolver = BaseResolver()
# _raise_extraction_failed should raise ExtractionFailedException with the given message
with pytest.raises(ExtractionFailedException) as exc_info_extraction:
resolver._raise_extraction_failed("extraction failed message")
assert "extraction failed message" in str(exc_info_extraction.value)
# _raise_invalid_url should raise InvalidURLException with the given message
with pytest.raises(InvalidURLException) as exc_info_invalid:
resolver._raise_invalid_url("invalid url message")
assert "invalid url message" in str(exc_info_invalid.value)
```
If `ExtractionFailedException` and `InvalidURLException` live in a different module than `truelink.exceptions`, update the import path accordingly.
If `BaseResolver` is abstract and cannot be instantiated directly, create a lightweight concrete subclass in the test (e.g. `class DummyResolver(BaseResolver): pass`) and instantiate that instead of `BaseResolver()`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary by Sourcery
Introduce centralized caching and error-handling utilities to TrueLink and clean up resolver/session management.
New Features:
Enhancements:
Tests: