fix: Club Players endpoint - DOB_AGE XPath and season_id handling by eskobar95 · Pull Request #6 · eskobar95/transfermarkt-api

eskobar95 · 2025-12-06T16:38:20Z

Fixes

Club Players Endpoint Issues

Fixed DOB_AGE XPath selector - Updated to match zentriert td with date pattern, resolving issue where 0 players were returned despite finding URLs/names
Handle None season_id - Default to current year if season_id is None (prevents 'None' in URL, similar to Club Competitions)
Added missing response fields - Added club name and seasonId to Club Players response

Changes

app/utils/xpath.py: Updated Clubs.Players.DOB_AGE XPath to use pattern matching for date columns
app/services/clubs/players.py:
- Default season_id to current year in __post_init__ if None
- Improved __update_season_id to validate extracted season
- Added name and seasonId to response in get_club_players

Testing

✅ Local: Returns 41 players for AC Milan (season 2024)
✅ XPath now correctly extracts DOB/Age data
✅ season_id defaults correctly when None

Related Issues

Fixes issue where Club Players endpoint returned 0 players despite finding player URLs and names. Root cause was DOB_AGE XPath returning 0 items, causing zip() to fail silently.

Summary by CodeRabbit

Release Notes

New Features
- Added debug endpoint for scraping diagnostics to test system health
Improvements
- Enhanced error handling and validation in search endpoints
- Improved data extraction reliability with refined data selectors
- Better handling of optional browser scraping with graceful fallbacks
- Clearer error messages for easier troubleshooting

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add full support for national teams across all club endpoints - Add new /clubs/{club_id}/competitions endpoint to retrieve club competitions - Add isNationalTeam field to Club Profile response schema - Make Club Profile fields optional to accommodate national teams - Enhance Club Players endpoint to handle national team HTML structure - Update XPath expressions to support both club and national team structures - Add intelligent detection logic for national teams - Maintain backward compatibility with existing club endpoints This update enables the API to work seamlessly with both regular clubs and national teams, providing a unified interface for all club-related data retrieval.

…ne length)

- Add GET /competitions/{competition_id}/seasons endpoint - Implement TransfermarktCompetitionSeasons service to scrape season data - Add CompetitionSeason and CompetitionSeasons Pydantic schemas - Support both cross-year (e.g., 25/26) and single-year (e.g., 2025) seasons - Handle historical seasons correctly (e.g., 99/00 -> 1999-2000) - Extract seasons from competition page dropdown/table structure - Return season_id, season_name, start_year, and end_year for each season - Sort seasons by start_year descending (newest first) Closes #[issue-number]

- Detect national team competitions (FIWC, EURO, COPA, AFAC, GOCU, AFCN) - Use /teilnehmer/pokalwettbewerb/ URL for national team competitions - Handle season_id correctly (year-1 for national teams in URL) - Add XPath expressions for participants table - Limit participants to expected tournament size to exclude non-qualified teams - Make season_id optional in CompetitionClubs schema - Update Dockerfile PYTHONPATH configuration

- Add length validation for ids and names before zip() to prevent silent data loss - Raise descriptive ValueError with logging if ids and names mismatch - Simplify seasonId assignment logic for national teams - Remove unnecessary try/except block (isdigit() prevents ValueError) - Clean up unreachable fallback code

- Add tournament size configuration to Settings class with environment variable support - Replace hardcoded dict with settings.get_tournament_size() method - Add warning logging when tournament size is not configured (instead of silent truncation) - Proceed without truncation when size is unavailable (no silent data loss) - Add validation for tournament sizes (must be positive integers) - Add comprehensive unit tests for both configured and fallback paths - Update README.md with new environment variables documentation This prevents silent truncation when tournament sizes change (e.g., World Cup expanding to 48) and allows easy configuration via environment variables.

- Remove extra HTTP request to fetch club profile just to read isNationalTeam - Set is_national_team=None to let TransfermarktClubPlayers use DOM heuristics - Remove broad except Exception that silently swallowed all errors - Improve performance by eliminating redundant network call - Players class already has robust DOM-based detection for national teams

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

- Keep imports at module level in clubs/competitions.py (from CodeRabbit review) - Preserve is_national_team flag logic in clubs/players.py - Keep name padding in competitions/search.py - Add .DS_Store to .gitignore

- Remove whitespace from blank lines (W293) - Add missing trailing commas (COM812) - Split long XPath lines to comply with E501 line length limit

- Format XPath strings to comply with line length - Format list comprehensions - Format is_season condition

- Fix session initialization issue causing all HTTP requests to fail - Improve block detection to avoid false positives - Optimize browser scraping delays (reduce from 12-13s to 0.4-0.8s) - Update XPath definitions for clubs, competitions, and players search - Fix nationalities parsing in player search (relative to each row) - Add comprehensive monitoring endpoints - Update settings for anti-scraping configuration Performance improvements: - HTTP success rate: 0% → 100% - Response time: 12-13s → 0.4-0.8s - Browser fallback: Always → Never needed - All endpoints now working correctly

Resolved conflicts: - app/services/clubs/players.py: Kept improved nationalities parsing with trim() - app/settings.py: Kept anti-scraping configuration settings - app/utils/xpath.py: Combined URL from HEAD with robust NAME fallbacks from main

- Fix import sorting - Add trailing commas - Replace single quotes with double quotes - Add noqa comments for long lines (User-Agent strings, XPath definitions) - Remove unused variables - Fix whitespace issues

- Change padding logic for players_joined_on, players_joined, and players_signed_from - Use "" instead of None to match the default value when elements are None - Fixes CodeRabbit review: inconsistent placeholder values

- Add try/except for playwright import to handle missing dependency - Make _browser_scraper optional (None if playwright unavailable) - Add checks in make_request_with_browser_fallback and get_monitoring_stats - Update test_browser_scraping endpoint to handle missing playwright - Add playwright to requirements.txt - App can now start without playwright, browser scraping disabled if unavailable

- Add playwright install chromium step in Dockerfile - Only runs if playwright is installed (graceful fallback) - Ensures browser binaries are available for Railway deployment

- Keep optional playwright import in base.py - Keep playwright availability check in test_browser_scraping endpoint - Maintains Railway deployment compatibility

- Keep optional playwright import and checks - Maintain browser scraper optional initialization - Preserve playwright availability checks in monitoring stats - All conflicts resolved, ready for merge

- Validate HTTP responses have content before returning - Validate browser scraping content before using - Add detailed logging for debugging deployment issues - Raise proper exceptions when content is empty or invalid - Helps diagnose why server returns 200 with empty content

- Add /debug/scraping endpoint to test HTTP, browser, and page requests - Shows content lengths, errors, and availability status - Helps diagnose why server returns empty responses - Better error messages in request_url_page and make_request_with_browser_fallback

- Validate page is not None after request_url_page() - Add exception handling in endpoints to catch and log errors - Add XPath error handling with detailed error messages - Warn when search results are empty (helps diagnose Railway issues) - Prevents silent failures that return 200 with empty data

- Log page HTML length and content validation - Log XPath extraction results - Warn when no results found - Helps diagnose why server returns empty results while local works

- Default to current year if season_id is None (like Club Competitions) - Add club name and seasonId to response - Fix URL formatting to prevent 'None' in URL - Improves __update_season_id to validate extracted season

- Fix DOB_AGE XPath to use TD[5] instead of zentriert[2] selector - Default season_id to current year if None (prevents 'None' in URL) - Add club name and seasonId to response - Resolves issue where 0 players were returned despite finding URLs/names

- Use more specific XPath matching zentriert td with date pattern - Filters for dates (contains '/') and minimum length to avoid false matches - Resolves issue where DOB_AGE returned 0 items

coderabbitai · 2025-12-06T16:38:31Z

Caution

Review failed

The pull request is closed.

Walkthrough

The changes integrate Playwright browser scraping with graceful fallback mechanisms across the codebase. Conditional Playwright installation is added to the Docker build, availability checks are implemented in services and endpoints, error handling and validation are enhanced throughout, and a debug scraping endpoint is introduced for testing multiple scraping methods.

Changes

Cohort / File(s)	Summary
Infrastructure & Dependencies `Dockerfile`, `requirements.txt`	Dockerfile adds conditional Playwright browser installation step guarded by import check; requirements.txt adds playwright==1.48.0 dependency
API Endpoints `app/api/endpoints/clubs.py`, `app/api/endpoints/competitions.py`	Both endpoints wrapped in try/except blocks with validation that results contain a "results" key; errors logged and re-raised
Main Application & Debug Endpoint `app/main.py`	Adds import guards for Playwright availability; introduces new `/debug/scraping` endpoint that performs sequential HTTP, browser fallback, and full-page scraping checks with aggregated results
Service Base Layer `app/services/base.py`	Introduces conditional Playwright imports with PLAYWRIGHT_AVAILABLE flag; adds error handling and validation for browser scraping, HTTP fallbacks, and content validation; enhances logging with URL context; adjusts browser scraper initialization and availability reporting
Club Services `app/services/clubs/players.py`, `app/services/clubs/search.py`	players.py defaults season_id to current year, adds guarded season_id extraction with fallback, augments response with seasonId and club name; search.py adds try/except with validation in post_init, debug logging, and content validation
Competition Services `app/services/competitions/search.py`	Replaces direct request_url_page() call with guarded block that validates result, raises ValueError on failure, and logs URL-aware errors
Utilities `app/utils/xpath.py`	Updates POSITIONS XPath to use contains() for class matching; updates DOB_AGE XPath to target cells with date patterns instead of positional index

Sequence Diagram

sequenceDiagram
    participant Client
    participant APIEndpoint as API Endpoint
    participant DebugEndpoint as /debug/scraping
    participant Base as Service Base
    participant HTTP as HTTP Request
    participant Browser as Playwright Browser
    
    Client->>DebugEndpoint: GET /debug/scraping?url=...
    DebugEndpoint->>Base: request_url_page(url)
    Base->>HTTP: HTTP GET request
    alt HTTP Success
        HTTP-->>Base: HTML content
        Base->>Base: Validate content
        Base-->>DebugEndpoint: success=true, content_length, error=null
    else HTTP Fails
        HTTP-->>Base: Error/Exception
        Base->>Base: Check PLAYWRIGHT_AVAILABLE
        alt Playwright Available
            Base->>Browser: scrape_with_browser(url)
            alt Browser Success
                Browser-->>Base: Page HTML
                Base->>Base: Validate content
                Base-->>DebugEndpoint: success=true, content_length, error=null
            else Browser Fails
                Browser-->>Base: Error
                Base-->>DebugEndpoint: success=false, content_length=0, error msg
            end
        else Playwright Unavailable
            Base-->>DebugEndpoint: success=false, content_length=0, error=ImportError
        end
    end
    DebugEndpoint->>Client: JSON: {http_success, browser_success, full_success, errors}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

app/services/base.py — Dense conditional logic with multiple error-handling paths, fallback mechanisms, and content validation; requires careful tracing of browser availability flags and error propagation
Playwright availability consistency — Verify that conditional imports, PLAYWRIGHT_AVAILABLE flags, and guard clauses are consistent and correct across all affected files
XPath expression changes — Validate that updated XPath expressions in app/utils/xpath.py still match the intended data and don't break existing queries
Error handling flow — Ensure exception re-raising, logging, and error messages with URL context are consistent and don't mask underlying issues
New /debug/scraping endpoint — Verify sequential scraping logic, aggregation of results, and response structure match intended behavior

Poem

🐰 Browsers and fallbacks, we now embrace,
Playwright joins the race with graceful grace,
When pages resist, our XPath runs deep,
Error guards watch—no secrets to keep,
Debug endpoints sparkle, robust and bright, ✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/anti-scraping-optimization

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5da0e and 2ef4f62.

📒 Files selected for processing (10)

Dockerfile (1 hunks)
app/api/endpoints/clubs.py (1 hunks)
app/api/endpoints/competitions.py (1 hunks)
app/main.py (2 hunks)
app/services/base.py (8 hunks)
app/services/clubs/players.py (3 hunks)
app/services/clubs/search.py (2 hunks)
app/services/competitions/search.py (1 hunks)
app/utils/xpath.py (1 hunks)
requirements.txt (1 hunks)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

eskobar95 added 30 commits November 30, 2025 19:15

chore: trigger CI workflow

37fb157

fix: update workflow runner to ubuntu-latest for better availability

6e8631e

fix: resolve linting errors (ruff formatting and style issues)

9e615e0

fix: resolve all ruff linting errors (whitespace, trailing commas, li…

2f1f4aa

…ne length)

fix: remove whitespace from blank line in competitions search

8cb0393

fix: apply black formatting to competitions and profile services

63a194a

docs: add docstring to pad_list function for interrogate coverage

1bb3c6f

fix: specify Python version in workflow to remove warning

a529933

chore: Add .DS_Store to .gitignore

25cbc7b

refactor: Move imports to module level in clubs competitions service

6b07ff9

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

refactor: Move imports to module level in clubs competitions service

085db81

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

chore: Resolve merge conflicts with main branch

1383dc9

- Keep imports at module level in clubs/competitions.py (from CodeRabbit review) - Preserve is_national_team flag logic in clubs/players.py - Keep name padding in competitions/search.py - Add .DS_Store to .gitignore

fix: Resolve all Ruff linter errors

27a5f28

- Remove whitespace from blank lines (W293) - Add missing trailing commas (COM812) - Split long XPath lines to comply with E501 line length limit

style: Apply black formatting to seasons.py

cd83f20

- Format XPath strings to comply with line length - Format list comprehensions - Format is_season condition

fix: Resolve all ruff linting errors

a419b41

- Fix import sorting - Add trailing commas - Replace single quotes with double quotes - Add noqa comments for long lines (User-Agent strings, XPath definitions) - Remove unused variables - Fix whitespace issues

fix: Resolve final linting error by breaking up long JavaScript object

ee45357

style: Format code with black

af8fb06

fix: Add missing trailing commas

b254874

docs: Add missing docstrings for __init__ methods

d11069a

fix: Align placeholder values to use empty strings consistently

047078e

- Change padding logic for players_joined_on, players_joined, and players_signed_from - Use "" instead of None to match the default value when elements are None - Fixes CodeRabbit review: inconsistent placeholder values

fix: Install playwright browsers in Dockerfile

547c75b

- Add playwright install chromium step in Dockerfile - Only runs if playwright is installed (graceful fallback) - Ensures browser binaries are available for Railway deployment

eskobar95 added 11 commits December 6, 2025 02:18

fix: Resolve merge conflicts with main branch

9e1f917

- Keep optional playwright import in base.py - Keep playwright availability check in test_browser_scraping endpoint - Maintains Railway deployment compatibility

fix: Resolve all merge conflicts with main branch

bdca1cc

- Keep optional playwright import and checks - Maintain browser scraper optional initialization - Preserve playwright availability checks in monitoring stats - All conflicts resolved, ready for merge

fix: Fix import order and add missing etree import

2e36cae

debug: Add detailed logging to diagnose empty results on Railway

4d7ddce

- Log page HTML length and content validation - Log XPath extraction results - Warn when no results found - Helps diagnose why server returns empty results while local works

fix: Remove unnecessary f-string prefixes (ruff F541)

83108ad

fix: Handle None season_id in Club Players endpoint

ec76e5f

- Default to current year if season_id is None (like Club Competitions) - Add club name and seasonId to response - Fix URL formatting to prevent 'None' in URL - Improves __update_season_id to validate extracted season

fix: Improve Club Players DOB_AGE XPath selector

2ef4f62

- Use more specific XPath matching zentriert td with date pattern - Filters for dates (contains '/') and minimum length to avoid false matches - Resolves issue where DOB_AGE returned 0 items

eskobar95 merged commit 057f689 into main Dec 6, 2025
1 of 2 checks passed

eskobar95 mentioned this pull request Dec 6, 2025

fix: Add name and seasonId fields to ClubPlayers schema #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Club Players endpoint - DOB_AGE XPath and season_id handling#6

fix: Club Players endpoint - DOB_AGE XPath and season_id handling#6
eskobar95 merged 41 commits intomainfrom
fix/anti-scraping-optimization

eskobar95 commented Dec 6, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 6, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eskobar95 commented Dec 6, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes

Club Players Endpoint Issues

Changes

Testing

Related Issues

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eskobar95 commented Dec 6, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 6, 2025 •

edited

Loading