fix: Railway deployment fixes and debugging improvements by eskobar95 · Pull Request #5 · eskobar95/transfermarkt-api

eskobar95 · 2025-12-06T16:19:09Z

🔧 Railway Deployment Fixes

This PR fixes issues with Railway deployment where endpoints return 200 OK but with empty results.

🐛 Problem

Endpoints return 200 OK but with empty on Railway
Works perfectly locally but fails on server
No exceptions thrown, just empty data

✅ Fixes

Better Error Handling
- Validate HTTP responses have content before returning
- Validate browser scraping content before using
- Raise proper exceptions when content is empty or invalid
Enhanced Logging
- Log page HTML length and content validation
- Log XPath extraction results
- Warn when no results found
- Detailed error messages for debugging
Validation Improvements
- Validate page is not None after request_url_page()
- Add exception handling in endpoints to catch and log errors
- Add XPath error handling with detailed error messages
Debug Endpoint
- New /debug/scraping endpoint to test HTTP, browser, and page requests
- Shows content lengths, errors, and availability status
- Helps diagnose why server returns empty responses

📊 Changes

app/services/base.py: Better validation and error handling
app/services/clubs/search.py: Detailed logging for empty results
app/services/competitions/search.py: Validation in post_init
app/api/endpoints/clubs.py: Exception handling and warnings
app/api/endpoints/competitions.py: Exception handling and warnings
app/main.py: New debug endpoint

🧪 Testing

After deployment, check Railway logs for:

Page HTML length
XPath extraction results
Warnings when no results found
Detailed error messages

Use /debug/scraping endpoint to diagnose issues.

Ready for Review ✅

- Add full support for national teams across all club endpoints - Add new /clubs/{club_id}/competitions endpoint to retrieve club competitions - Add isNationalTeam field to Club Profile response schema - Make Club Profile fields optional to accommodate national teams - Enhance Club Players endpoint to handle national team HTML structure - Update XPath expressions to support both club and national team structures - Add intelligent detection logic for national teams - Maintain backward compatibility with existing club endpoints This update enables the API to work seamlessly with both regular clubs and national teams, providing a unified interface for all club-related data retrieval.

…ne length)

- Add GET /competitions/{competition_id}/seasons endpoint - Implement TransfermarktCompetitionSeasons service to scrape season data - Add CompetitionSeason and CompetitionSeasons Pydantic schemas - Support both cross-year (e.g., 25/26) and single-year (e.g., 2025) seasons - Handle historical seasons correctly (e.g., 99/00 -> 1999-2000) - Extract seasons from competition page dropdown/table structure - Return season_id, season_name, start_year, and end_year for each season - Sort seasons by start_year descending (newest first) Closes #[issue-number]

- Detect national team competitions (FIWC, EURO, COPA, AFAC, GOCU, AFCN) - Use /teilnehmer/pokalwettbewerb/ URL for national team competitions - Handle season_id correctly (year-1 for national teams in URL) - Add XPath expressions for participants table - Limit participants to expected tournament size to exclude non-qualified teams - Make season_id optional in CompetitionClubs schema - Update Dockerfile PYTHONPATH configuration

- Add length validation for ids and names before zip() to prevent silent data loss - Raise descriptive ValueError with logging if ids and names mismatch - Simplify seasonId assignment logic for national teams - Remove unnecessary try/except block (isdigit() prevents ValueError) - Clean up unreachable fallback code

- Add tournament size configuration to Settings class with environment variable support - Replace hardcoded dict with settings.get_tournament_size() method - Add warning logging when tournament size is not configured (instead of silent truncation) - Proceed without truncation when size is unavailable (no silent data loss) - Add validation for tournament sizes (must be positive integers) - Add comprehensive unit tests for both configured and fallback paths - Update README.md with new environment variables documentation This prevents silent truncation when tournament sizes change (e.g., World Cup expanding to 48) and allows easy configuration via environment variables.

- Remove extra HTTP request to fetch club profile just to read isNationalTeam - Set is_national_team=None to let TransfermarktClubPlayers use DOM heuristics - Remove broad except Exception that silently swallowed all errors - Improve performance by eliminating redundant network call - Players class already has robust DOM-based detection for national teams

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

- Keep imports at module level in clubs/competitions.py (from CodeRabbit review) - Preserve is_national_team flag logic in clubs/players.py - Keep name padding in competitions/search.py - Add .DS_Store to .gitignore

- Remove whitespace from blank lines (W293) - Add missing trailing commas (COM812) - Split long XPath lines to comply with E501 line length limit

- Format XPath strings to comply with line length - Format list comprehensions - Format is_season condition

- Fix session initialization issue causing all HTTP requests to fail - Improve block detection to avoid false positives - Optimize browser scraping delays (reduce from 12-13s to 0.4-0.8s) - Update XPath definitions for clubs, competitions, and players search - Fix nationalities parsing in player search (relative to each row) - Add comprehensive monitoring endpoints - Update settings for anti-scraping configuration Performance improvements: - HTTP success rate: 0% → 100% - Response time: 12-13s → 0.4-0.8s - Browser fallback: Always → Never needed - All endpoints now working correctly

Resolved conflicts: - app/services/clubs/players.py: Kept improved nationalities parsing with trim() - app/settings.py: Kept anti-scraping configuration settings - app/utils/xpath.py: Combined URL from HEAD with robust NAME fallbacks from main

- Fix import sorting - Add trailing commas - Replace single quotes with double quotes - Add noqa comments for long lines (User-Agent strings, XPath definitions) - Remove unused variables - Fix whitespace issues

- Change padding logic for players_joined_on, players_joined, and players_signed_from - Use "" instead of None to match the default value when elements are None - Fixes CodeRabbit review: inconsistent placeholder values

- Add try/except for playwright import to handle missing dependency - Make _browser_scraper optional (None if playwright unavailable) - Add checks in make_request_with_browser_fallback and get_monitoring_stats - Update test_browser_scraping endpoint to handle missing playwright - Add playwright to requirements.txt - App can now start without playwright, browser scraping disabled if unavailable

- Add playwright install chromium step in Dockerfile - Only runs if playwright is installed (graceful fallback) - Ensures browser binaries are available for Railway deployment

- Keep optional playwright import in base.py - Keep playwright availability check in test_browser_scraping endpoint - Maintains Railway deployment compatibility

- Keep optional playwright import and checks - Maintain browser scraper optional initialization - Preserve playwright availability checks in monitoring stats - All conflicts resolved, ready for merge

- Validate HTTP responses have content before returning - Validate browser scraping content before using - Add detailed logging for debugging deployment issues - Raise proper exceptions when content is empty or invalid - Helps diagnose why server returns 200 with empty content

- Add /debug/scraping endpoint to test HTTP, browser, and page requests - Shows content lengths, errors, and availability status - Helps diagnose why server returns empty responses - Better error messages in request_url_page and make_request_with_browser_fallback

- Validate page is not None after request_url_page() - Add exception handling in endpoints to catch and log errors - Add XPath error handling with detailed error messages - Warn when search results are empty (helps diagnose Railway issues) - Prevents silent failures that return 200 with empty data

- Log page HTML length and content validation - Log XPath extraction results - Warn when no results found - Helps diagnose why server returns empty results while local works

coderabbitai · 2025-12-06T16:19:19Z

Caution

Review failed

The head commit changed during the review from 4d7ddce to 83108ad.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/anti-scraping-optimization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

eskobar95 added 30 commits November 30, 2025 19:15

chore: trigger CI workflow

37fb157

fix: update workflow runner to ubuntu-latest for better availability

6e8631e

fix: resolve linting errors (ruff formatting and style issues)

9e615e0

fix: resolve all ruff linting errors (whitespace, trailing commas, li…

2f1f4aa

…ne length)

fix: remove whitespace from blank line in competitions search

8cb0393

fix: apply black formatting to competitions and profile services

63a194a

docs: add docstring to pad_list function for interrogate coverage

1bb3c6f

fix: specify Python version in workflow to remove warning

a529933

chore: Add .DS_Store to .gitignore

25cbc7b

refactor: Move imports to module level in clubs competitions service

6b07ff9

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

refactor: Move imports to module level in clubs competitions service

085db81

- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization

chore: Resolve merge conflicts with main branch

1383dc9

- Keep imports at module level in clubs/competitions.py (from CodeRabbit review) - Preserve is_national_team flag logic in clubs/players.py - Keep name padding in competitions/search.py - Add .DS_Store to .gitignore

fix: Resolve all Ruff linter errors

27a5f28

- Remove whitespace from blank lines (W293) - Add missing trailing commas (COM812) - Split long XPath lines to comply with E501 line length limit

style: Apply black formatting to seasons.py

cd83f20

- Format XPath strings to comply with line length - Format list comprehensions - Format is_season condition

fix: Resolve all ruff linting errors

a419b41

- Fix import sorting - Add trailing commas - Replace single quotes with double quotes - Add noqa comments for long lines (User-Agent strings, XPath definitions) - Remove unused variables - Fix whitespace issues

fix: Resolve final linting error by breaking up long JavaScript object

ee45357

style: Format code with black

af8fb06

fix: Add missing trailing commas

b254874

docs: Add missing docstrings for __init__ methods

d11069a

fix: Align placeholder values to use empty strings consistently

047078e

- Change padding logic for players_joined_on, players_joined, and players_signed_from - Use "" instead of None to match the default value when elements are None - Fixes CodeRabbit review: inconsistent placeholder values

fix: Install playwright browsers in Dockerfile

547c75b

- Add playwright install chromium step in Dockerfile - Only runs if playwright is installed (graceful fallback) - Ensures browser binaries are available for Railway deployment

eskobar95 added 7 commits December 6, 2025 02:18

fix: Resolve merge conflicts with main branch

9e1f917

- Keep optional playwright import in base.py - Keep playwright availability check in test_browser_scraping endpoint - Maintains Railway deployment compatibility

fix: Resolve all merge conflicts with main branch

bdca1cc

- Keep optional playwright import and checks - Maintain browser scraper optional initialization - Preserve playwright availability checks in monitoring stats - All conflicts resolved, ready for merge

fix: Fix import order and add missing etree import

2e36cae

debug: Add detailed logging to diagnose empty results on Railway

4d7ddce

- Log page HTML length and content validation - Log XPath extraction results - Warn when no results found - Helps diagnose why server returns empty results while local works

fix: Remove unnecessary f-string prefixes (ruff F541)

83108ad

eskobar95 merged commit 7f5da0e into main Dec 6, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Railway deployment fixes and debugging improvements#5

fix: Railway deployment fixes and debugging improvements#5
eskobar95 merged 38 commits intomainfrom
fix/anti-scraping-optimization

eskobar95 commented Dec 6, 2025

Uh oh!

coderabbitai Bot commented Dec 6, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eskobar95 commented Dec 6, 2025

🔧 Railway Deployment Fixes

🐛 Problem

✅ Fixes

📊 Changes

🧪 Testing

Uh oh!

coderabbitai Bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Dec 6, 2025 •

edited

Loading