Skip to content

COde Duplication FIxed!!#74

Merged
5hojib merged 16 commits into
5hojib:mainfrom
PandazNetwork:main
Apr 20, 2026
Merged

COde Duplication FIxed!!#74
5hojib merged 16 commits into
5hojib:mainfrom
PandazNetwork:main

Conversation

@PandazNetwork
Copy link
Copy Markdown
Contributor

@PandazNetwork PandazNetwork commented Apr 15, 2026

Summary by Sourcery

Introduce centralized caching and error-handling utilities to TrueLink and clean up resolver/session management.

New Features:

  • Add an internal LRU cache with TTL support for TrueLinkResolver results, configurable via constructor options.

Enhancements:

  • Expose cache management APIs on TrueLinkResolver for clearing entries and cleaning up expired items.
  • Add a classmethod to clean up resolver instances and their underlying HTTP sessions.
  • Centralize extraction and invalid-URL error raising logic in BaseResolver to remove duplicated helper methods across resolvers.
  • Refine supported-domain lookup to use class-level resolver registration for better extensibility.
  • Simplify and streamline BaseResolver file metadata fetching while preserving existing behavior.

Tests:

  • Add a standalone test script to exercise cache behavior, resolver/session cleanup, BaseResolver helpers, and supported-domain queries.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 15, 2026

Reviewer's Guide

Introduces a TTL-aware LRU cache and resolver/session lifecycle utilities to TrueLinkResolver, centralizes common error-raising helpers in BaseResolver to remove duplicated methods from individual resolvers, refactors HTTP file-metadata fetching for clarity, and adds a lightweight script to manually exercise the new behaviors.

Class diagram for TrueLinkResolver caching and resolver lifecycle

classDiagram
    class _CacheEntry {
        +value LinkResult|FolderResult
        +timestamp float
        __init__(value LinkResult|FolderResult)
    }

    class _LRUCache {
        +max_size int
        +ttl int
        -_cache OrderedDict~str,_CacheEntry~
        +__init__(max_size int, ttl int)
        +get(key str) LinkResult|FolderResult|None
        +set(key str, value LinkResult|FolderResult) None
        +clear() None
        +cleanup_expired() int
    }

    class TrueLinkResolver {
        +timeout int
        +max_retries int
        +proxy str|None
        -_cache_max_size int
        -_cache_ttl int
        +_resolvers dict~str,type~
        +_resolver_instances dict~str,object~
        +_cache _LRUCache
        +__init__(timeout int, max_retries int, proxy str|None, cache_max_size int, cache_ttl int)
        +resolve(url str, use_cache bool) LinkResult|FolderResult
        +clear_cache() None
        +cleanup_cache() int
        +is_supported(url str) bool
        +get_supported_domains() list~str~
        +cleanup_resolver_instances() None
        +register_resolver(domain_pattern str, resolver_cls type) None
        +_get_resolver(url str) object
    }

    TrueLinkResolver --> _LRUCache : uses
    _LRUCache o--> _CacheEntry : contains
Loading

Class diagram for BaseResolver error helpers and fetch refactor

classDiagram
    class BaseResolver {
        -session aiohttp.ClientSession|None
        +__aenter__() BaseResolver
        +__aexit__(exc_type type|None, exc_val BaseException|None, exc_tb TracebackType|None) bool
        +_create_session() None
        +_close_session() None
        +_fetch_file_details(url str, headers dict~str,str~|None) tuple~str|None,int|None,str|None~
        +_extract_filename(content_disposition str) str|None
        +_get_filename_from_url(url str) str
        +_raise_extraction_failed(msg str) None
        +_raise_invalid_url(msg str) None
    }

    class MediafireResolver {
        +resolve(url str) LinkResult|FolderResult
        +collect_files(folder_key str, path_prefix str) None
    }

    BaseResolver <|-- MediafireResolver : inherits
Loading

File-Level Changes

Change Details Files
Add an internal TTL-based LRU cache implementation and integrate it into TrueLinkResolver for URL resolution caching and cache management.
  • Introduce _CacheEntry and _LRUCache helper classes using OrderedDict and per-entry timestamps
  • Replace the previous dict-based TrueLinkResolver cache with the new _LRUCache, including get/set usage in resolve()
  • Extend TrueLinkResolver.init to accept cache_max_size and cache_ttl parameters and store them on the instance
  • Expose cache management APIs via clear_cache() and cleanup_cache() classmethods
src/truelink/core.py
Enhance TrueLinkResolver utilities for resolver lifecycle and supported-domain inspection.
  • Change get_supported_domains from staticmethod to classmethod and implement it using cls._resolvers
  • Add cleanup_resolver_instances classmethod to close resolver sessions and clear _resolver_instances safely, handling running and non-running event loops
src/truelink/core.py
Refactor BaseResolver’s file metadata fetch logic and centralize common exception-raising helpers.
  • Reorganize _fetch_file_details to more clearly separate HEAD and fallback GET-with-Range paths while preserving behavior
  • Ensure session creation is attempted once and early, with immediate return on failure and consistent cleanup of temporary sessions
  • Add _raise_extraction_failed and _raise_invalid_url helpers that raise the corresponding exceptions with a message
src/truelink/resolvers/base.py
Remove duplicated _raise_extraction_failed/_raise_invalid_url implementations from individual resolver classes now covered by BaseResolver.
  • Delete resolver-specific _raise_extraction_failed overrides in multiple resolvers
  • Delete resolver-specific _raise_invalid_url overrides where they only re-raised InvalidURLException
  • Rely on the new BaseResolver helper methods for raising these exceptions
src/truelink/resolvers/mediafire.py
src/truelink/resolvers/pixeldrain.py
src/truelink/resolvers/buzzheavier.py
src/truelink/resolvers/fichier.py
src/truelink/resolvers/fuckingfast.py
src/truelink/resolvers/linkbox.py
src/truelink/resolvers/linkvertise.py
src/truelink/resolvers/lulacloud.py
src/truelink/resolvers/mediafile.py
src/truelink/resolvers/onedrive.py
src/truelink/resolvers/pcloud.py
src/truelink/resolvers/streamtape.py
src/truelink/resolvers/terabox.py
src/truelink/resolvers/tmpsend.py
src/truelink/resolvers/uploadee.py
src/truelink/resolvers/xham.py
src/truelink/resolvers/yandexdisk.py
Add a standalone test script to manually verify the new cache, resolver cleanup, BaseResolver helpers, and supported-domain utilities.
  • Create test_changes.py script that reloads truelink modules and exercises cache methods, session cleanup, BaseResolver helper presence, is_supported, and get_supported_domains
  • Prints simple diagnostic output and exits non-zero on failure to aid manual verification
test_changes.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust LRU cache with TTL support to the TrueLinkResolver, replacing the previous basic dictionary-based caching mechanism. It also centralizes exception-raising helpers in the BaseResolver, leading to significant code cleanup across multiple resolver implementations. Additionally, new methods for cache management and resolver instance cleanup have been added. Feedback focuses on two main areas: first, the constructor parameters for cache configuration are currently ignored because they do not affect the shared class-level cache instance; second, the session cleanup logic relies on deprecated event loop methods and should be refactored into an asynchronous method for better reliability.

Comment thread src/truelink/core.py
Comment thread src/truelink/core.py
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 6 issues, and left some high level feedback:

  • The new cache_max_size and cache_ttl parameters on TrueLinkResolver are stored on the instance but never applied to the class-level _cache, so they currently have no effect on cache behavior; consider wiring these through to construct/configure the _LRUCache or making the cache instance-scoped instead of a ClassVar.
  • The new cleanup_resolver_instances method mixes synchronous and asynchronous event-loop control (get_event_loop, run_until_complete, create_task) and swallows all exceptions, which can behave unpredictably in different runtime contexts; consider requiring an event loop to be passed in or making it fully async and letting callers decide how to schedule it, and avoid catching bare Exception.
  • The test_changes.py script at the repo root looks like an ad-hoc manual test harness (e.g., printing to stdout, touching private attributes) rather than a reusable test; consider removing it from the PR or converting the relevant checks into proper automated tests under the existing test framework.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `cache_max_size` and `cache_ttl` parameters on `TrueLinkResolver` are stored on the instance but never applied to the class-level `_cache`, so they currently have no effect on cache behavior; consider wiring these through to construct/configure the `_LRUCache` or making the cache instance-scoped instead of a `ClassVar`.
- The new `cleanup_resolver_instances` method mixes synchronous and asynchronous event-loop control (`get_event_loop`, `run_until_complete`, `create_task`) and swallows all exceptions, which can behave unpredictably in different runtime contexts; consider requiring an event loop to be passed in or making it fully async and letting callers decide how to schedule it, and avoid catching bare `Exception`.
- The `test_changes.py` script at the repo root looks like an ad-hoc manual test harness (e.g., printing to stdout, touching private attributes) rather than a reusable test; consider removing it from the PR or converting the relevant checks into proper automated tests under the existing test framework.

## Individual Comments

### Comment 1
<location path="src/truelink/core.py" line_range="116" />
<code_context>

     _resolvers: ClassVar[dict[str, type]] = {}
     _resolver_instances: ClassVar[dict[str, object]] = {}
+    _cache: ClassVar[_LRUCache] = _LRUCache(max_size=1000, ttl=3600)

     def __init__(
</code_context>
<issue_to_address>
**issue (bug_risk):** Constructor cache parameters are stored but never applied to the shared LRU cache instance.

`TrueLinkResolver.__init__` accepts `cache_max_size` and `cache_ttl` and stores them on the instance, but the actual cache is the class-level `_cache = _LRUCache(max_size=1000, ttl=3600)`, which ignores these values. As a result, changing the constructor arguments doesn’t alter caching behavior. Either wire these parameters into how `_cache` is created (e.g., lazy init using the first instance’s settings), make the cache per-instance, or remove the unused parameters to avoid misleading API surface.
</issue_to_address>

### Comment 2
<location path="src/truelink/core.py" line_range="290-299" />
<code_context>
+        return list(cls._resolvers.keys())
+
+    @classmethod
+    def cleanup_resolver_instances(cls) -> None:
+        """Clean up all resolver instances and close their sessions."""
+        for instance in cls._resolver_instances.values():
+            if hasattr(instance, "session") and instance.session:
+                import asyncio
+
+                try:
+                    loop = asyncio.get_event_loop()
+                    if loop.is_running():
+                        asyncio.create_task(instance.session.close())
+                    else:
+                        loop.run_until_complete(instance.session.close())
+                except Exception:
+                    pass
+        cls._resolver_instances.clear()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The resolver cleanup logic around asyncio event loops is fragile and can silently skip closing sessions.

In `cleanup_resolver_instances`, `asyncio.get_event_loop()` can raise `RuntimeError` when no default loop exists (e.g. worker threads, newer asyncio usage), and the blanket `except Exception: pass` will silently skip closing sessions. Consider using `asyncio.get_running_loop()` when a loop is active and a safe fallback like `asyncio.run(instance.session.close())` in sync contexts, and tightening the `except` to only handle expected asyncio/loop errors so real issues aren’t hidden and sessions are reliably closed.
</issue_to_address>

### Comment 3
<location path="test_changes.py" line_range="1-10" />
<code_context>
+#!/usr/bin/env python3
</code_context>
<issue_to_address>
**issue (testing):** Convert this ad‑hoc script into proper automated tests runnable by the test suite

This file is currently a manual verification script (`print` + `__main__` + `sys.exit`), so it won’t run under pytest/unittest or in CI. To integrate it into your automated test suite:

- Replace `print`/`sys.exit` with assertions in proper test functions.
- Drop the `if __name__ == "__main__"` runner and rely on test discovery.
- Move/rename it to match your standard test layout (e.g., under `tests/`).

That way the cache, resolver cleanup, and base methods are exercised on every CI run instead of only when run manually.
</issue_to_address>

### Comment 4
<location path="test_changes.py" line_range="13-22" />
<code_context>
+def test_cache():
</code_context>
<issue_to_address>
**suggestion (testing):** Replace print-based checks in `test_cache` with assertions that validate behavior

Right now `test_cache` only prints cache details, so it will still pass even if the cache is broken. Please replace the prints with assertions, for example:

- Type: `assert isinstance(resolver._cache, _LRUCache)` (import `_LRUCache` if appropriate)
- Config: `assert resolver._cache.max_size == 10` and `assert resolver._cache.ttl == 60`
- Behavior:
  - After `TrueLinkResolver.clear_cache()`, cached values for a URL are no longer returned
  - Inserting more than `max_size` entries evicts the oldest (LRU behavior)
  - After simulating TTL expiry (via time mocking), entries are not returned and `cleanup_cache()` reports the correct number removed

This will ensure the test actually verifies the cache behavior instead of just exercising the code path.

Suggested implementation:

```python
import importlib

from truelink import TrueLinkResolver, _LRUCache

```

```python
def test_cache(monkeypatch):
    """Test cache functionality."""
    # Initialize resolver with known cache settings
    resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)

    # 1. Cache type and configuration
    assert isinstance(resolver._cache, _LRUCache)
    assert resolver._cache.max_size == 10
    assert resolver._cache.ttl == 60

    # 2. Behavior: clear_cache() removes cached values
    url = "https://example.com/clear-cache"
    result_1 = resolver.resolve(url)
    # Ensure the value is cached (cache should contain the URL)
    assert url in resolver._cache

    TrueLinkResolver.clear_cache()

    # Cache should be empty after clear
    assert url not in resolver._cache
    assert len(resolver._cache) == 0

    # 3. Behavior: LRU eviction when exceeding max_size
    # Fill the cache with more than max_size distinct entries
    base_url = "https://example.com/lru-"
    for i in range(11):
        resolver.resolve(f"{base_url}{i}")

    # Cache should not exceed max_size
    assert len(resolver._cache) == 10

    # Oldest entry (LRU) should be evicted
    assert f"{base_url}0" not in resolver._cache
    # Newest entry should be present
    assert f"{base_url}10" in resolver._cache

    # 4. Behavior: TTL expiry and cleanup_cache()
    # Dynamically locate the module where _LRUCache (and its time usage) is defined
    cache_module = importlib.import_module(_LRUCache.__module__)

    # Some implementations may use time.time, others time.monotonic; support both.
    # We prefer monotonic if present, otherwise fall back to time.
    if hasattr(cache_module, "time") and hasattr(cache_module.time, "monotonic"):
        current_time = [cache_module.time.monotonic()]

        def fake_monotonic():
            return current_time[0]

        monkeypatch.setattr(cache_module.time, "monotonic", fake_monotonic, raising=False)
    elif hasattr(cache_module, "time") and hasattr(cache_module.time, "time"):
        current_time = [cache_module.time.time()]

        def fake_time():
            return current_time[0]

        monkeypatch.setattr(cache_module.time, "time", fake_time, raising=False)
    else:
        # If the cache implementation does not use a standard time module,
        # this test would need to be adjusted to its specific design.
        # Fail explicitly so the behavior is not silently untested.
        raise RuntimeError("Cache module does not expose a supported time function for TTL testing")

    # Recreate resolver to ensure we start with a fresh cache using the patched time
    TrueLinkResolver.clear_cache()
    resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)

    ttl_url = "https://example.com/ttl"
    resolver.resolve(ttl_url)
    assert ttl_url in resolver._cache

    # Advance time beyond the TTL to simulate expiry
    current_time[0] += 61  # > cache_ttl (60)

    # cleanup_cache should remove expired entries and report how many were removed
    removed = TrueLinkResolver.cleanup_cache()
    assert removed >= 1

    # After cleanup, the expired entry must no longer be present
    assert ttl_url not in resolver._cache

```

If other tests rely on `test_cache` having the old signature (without `monkeypatch`), they should be updated to not call `test_cache` directly or to pass the `monkeypatch` fixture appropriately (pytest will inject it automatically when running the test normally). Also, if the actual cache implementation uses a different timing mechanism than `cache_module.time.monotonic` or `cache_module.time.time`, the TTL-testing part of this test should be adapted to monkeypatch the correct function.
</issue_to_address>

### Comment 5
<location path="test_changes.py" line_range="19-28" />
<code_context>
+    print("Testing Cache Management")
+    print("=" * 50)
+
+    resolver = TrueLinkResolver(cache_max_size=10, cache_ttl=60)
+
+    # Test cache type
+    print(f"\n1. Cache type: {type(resolver._cache).__name__}")
+    print(f"   Expected: _LRUCache")
+
+    # Test cache attributes
+    print(f"\n2. Cache attributes:")
+    print(f"   - Has max_size: {hasattr(resolver._cache, 'max_size')}")
+    print(f"   - Has ttl: {hasattr(resolver._cache, 'ttl')}")
+    print(f"   - Has get method: {hasattr(resolver._cache, 'get')}")
+    print(f"   - Has set method: {hasattr(resolver._cache, 'set')}")
+    print(f"   - Has clear method: {hasattr(resolver._cache, 'clear')}")
+    print(f"   - Has cleanup_expired method: {hasattr(resolver._cache, 'cleanup_expired')}")
+
+    if hasattr(resolver._cache, 'max_size'):
+        print(f"\n3. Cache configuration:")
+        print(f"   - max_size: {resolver._cache.max_size}")
</code_context>
<issue_to_address>
**suggestion (testing):** Avoid testing via private attributes and focus on public behavior of the caching API

This test reaches into `resolver._cache`, which is a private implementation detail and makes the test brittle against internal refactors. Instead, prefer exercising caching via the public API:

- Call `resolve(..., use_cache=True)` twice and assert the second call uses the cache (e.g., via a mock or call counter).
- Use `TrueLinkResolver.clear_cache()` / `cleanup_cache()` and assert observable behavior rather than inspecting `.max_size`/`.ttl`.

If you need to verify configuration, assert it through constructor arguments and resulting behavior, not by directly accessing `_cache`.
</issue_to_address>

### Comment 6
<location path="test_changes.py" line_range="57-66" />
<code_context>
+    TrueLinkResolver.cleanup_resolver_instances()
+    print("   - cleanup_resolver_instances() executed successfully")
+
+def test_base_methods():
+    """Test BaseResolver methods."""
+    print("\n" + "=" * 50)
+    print("Testing BaseResolver Methods")
+    print("=" * 50)
+
+    from truelink.resolvers.base import BaseResolver
+
+    print(f"\n1. BaseResolver has _raise_extraction_failed: {hasattr(BaseResolver, '_raise_extraction_failed')}")
+    print(f"2. BaseResolver has _raise_invalid_url: {hasattr(BaseResolver, '_raise_invalid_url')}")
+
+def test_other_methods():
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen `test_base_methods` by asserting the new helper methods actually raise the expected exceptions

Right now this test only prints whether the helpers exist via `hasattr`, so it doesn’t validate their behavior. Please turn this into an assertion-based test (e.g. `with pytest.raises(ExtractionFailedException): BaseResolver()._raise_extraction_failed("msg")` and similarly for `InvalidURLException`), and optionally assert on the message text so we’ll catch any future changes that break these helpers’ semantics, not just their presence.

Suggested implementation:

```python
def test_base_methods():
    """Test BaseResolver helper methods raise the expected exceptions."""
    import pytest
    from truelink.resolvers.base import BaseResolver
    from truelink.exceptions import ExtractionFailedException, InvalidURLException

    resolver = BaseResolver()

    # _raise_extraction_failed should raise ExtractionFailedException with the given message
    with pytest.raises(ExtractionFailedException) as exc_info_extraction:
        resolver._raise_extraction_failed("extraction failed message")
    assert "extraction failed message" in str(exc_info_extraction.value)

    # _raise_invalid_url should raise InvalidURLException with the given message
    with pytest.raises(InvalidURLException) as exc_info_invalid:
        resolver._raise_invalid_url("invalid url message")
    assert "invalid url message" in str(exc_info_invalid.value)

```

If `ExtractionFailedException` and `InvalidURLException` live in a different module than `truelink.exceptions`, update the import path accordingly.  
If `BaseResolver` is abstract and cannot be instantiated directly, create a lightweight concrete subclass in the test (e.g. `class DummyResolver(BaseResolver): pass`) and instantiate that instead of `BaseResolver()`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/truelink/core.py
Comment thread src/truelink/core.py
Comment thread test_changes.py
Comment thread test_changes.py
Comment thread test_changes.py
Comment thread test_changes.py
@5hojib 5hojib merged commit 05369f7 into 5hojib:main Apr 20, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants