Skip to content

fix: Make playwright optional for Railway deployment#4

Merged
eskobar95 merged 32 commits intomainfrom
fix/anti-scraping-optimization
Dec 6, 2025
Merged

fix: Make playwright optional for Railway deployment#4
eskobar95 merged 32 commits intomainfrom
fix/anti-scraping-optimization

Conversation

@eskobar95
Copy link
Copy Markdown
Owner

🔧 Railway Deployment Fix

This PR fixes the Railway deployment issue where the app failed to start due to missing playwright module.

🐛 Problem

  • Railway deployment failed with ModuleNotFoundError: No module named 'playwright'
  • App couldn't start because playwright was imported directly without error handling

✅ Solution

  1. Optional Playwright Import

    • Made playwright import optional with try/except
    • App can now start without playwright installed
    • Browser scraping is automatically disabled if playwright is unavailable
  2. Graceful Fallback

    • _browser_scraper is None if playwright unavailable
    • HTTP-first strategy still works perfectly
    • All endpoints function normally without browser scraping
  3. Dockerfile Update

    • Added playwright install chromium step
    • Only runs if playwright is installed (graceful fallback)
  4. Requirements Update

    • Added playwright==1.48.0 to requirements.txt
  5. Test Endpoint Fix

    • Updated /test/browser-scraping to handle missing playwright gracefully

📊 Impact

  • ✅ App can deploy on Railway without playwright
  • ✅ App starts successfully even if playwright installation fails
  • ✅ HTTP scraping continues to work (primary method)
  • ✅ Browser scraping available if playwright is installed

🧪 Testing

  • App starts successfully without playwright
  • All HTTP endpoints work correctly
  • Browser scraping disabled gracefully when playwright unavailable
  • Monitoring endpoints show correct browser availability status

📝 Additional Fixes

  • Fixed placeholder value inconsistency (CodeRabbit review)
  • Added missing docstrings for __init__ methods
  • Resolved all linting errors

Ready for Review

- Add full support for national teams across all club endpoints
- Add new /clubs/{club_id}/competitions endpoint to retrieve club competitions
- Add isNationalTeam field to Club Profile response schema
- Make Club Profile fields optional to accommodate national teams
- Enhance Club Players endpoint to handle national team HTML structure
- Update XPath expressions to support both club and national team structures
- Add intelligent detection logic for national teams
- Maintain backward compatibility with existing club endpoints

This update enables the API to work seamlessly with both regular clubs
and national teams, providing a unified interface for all club-related
data retrieval.
- Add GET /competitions/{competition_id}/seasons endpoint
- Implement TransfermarktCompetitionSeasons service to scrape season data
- Add CompetitionSeason and CompetitionSeasons Pydantic schemas
- Support both cross-year (e.g., 25/26) and single-year (e.g., 2025) seasons
- Handle historical seasons correctly (e.g., 99/00 -> 1999-2000)
- Extract seasons from competition page dropdown/table structure
- Return season_id, season_name, start_year, and end_year for each season
- Sort seasons by start_year descending (newest first)

Closes #[issue-number]
- Detect national team competitions (FIWC, EURO, COPA, AFAC, GOCU, AFCN)
- Use /teilnehmer/pokalwettbewerb/ URL for national team competitions
- Handle season_id correctly (year-1 for national teams in URL)
- Add XPath expressions for participants table
- Limit participants to expected tournament size to exclude non-qualified teams
- Make season_id optional in CompetitionClubs schema
- Update Dockerfile PYTHONPATH configuration
- Add length validation for ids and names before zip() to prevent silent data loss
- Raise descriptive ValueError with logging if ids and names mismatch
- Simplify seasonId assignment logic for national teams
- Remove unnecessary try/except block (isdigit() prevents ValueError)
- Clean up unreachable fallback code
- Add tournament size configuration to Settings class with environment variable support
- Replace hardcoded dict with settings.get_tournament_size() method
- Add warning logging when tournament size is not configured (instead of silent truncation)
- Proceed without truncation when size is unavailable (no silent data loss)
- Add validation for tournament sizes (must be positive integers)
- Add comprehensive unit tests for both configured and fallback paths
- Update README.md with new environment variables documentation

This prevents silent truncation when tournament sizes change (e.g., World Cup expanding to 48)
and allows easy configuration via environment variables.
- Remove extra HTTP request to fetch club profile just to read isNationalTeam
- Set is_national_team=None to let TransfermarktClubPlayers use DOM heuristics
- Remove broad except Exception that silently swallowed all errors
- Improve performance by eliminating redundant network call
- Players class already has robust DOM-based detection for national teams
- Move datetime and HTTPException imports from method level to module level
- Improves code readability and marginally improves performance
- Follows Python best practices for import organization
- Move datetime and HTTPException imports from method level to module level
- Improves code readability and marginally improves performance
- Follows Python best practices for import organization
- Keep imports at module level in clubs/competitions.py (from CodeRabbit review)
- Preserve is_national_team flag logic in clubs/players.py
- Keep name padding in competitions/search.py
- Add .DS_Store to .gitignore
- Remove whitespace from blank lines (W293)
- Add missing trailing commas (COM812)
- Split long XPath lines to comply with E501 line length limit
- Format XPath strings to comply with line length
- Format list comprehensions
- Format is_season condition
- Fix session initialization issue causing all HTTP requests to fail
- Improve block detection to avoid false positives
- Optimize browser scraping delays (reduce from 12-13s to 0.4-0.8s)
- Update XPath definitions for clubs, competitions, and players search
- Fix nationalities parsing in player search (relative to each row)
- Add comprehensive monitoring endpoints
- Update settings for anti-scraping configuration

Performance improvements:
- HTTP success rate: 0% → 100%
- Response time: 12-13s → 0.4-0.8s
- Browser fallback: Always → Never needed
- All endpoints now working correctly
Resolved conflicts:
- app/services/clubs/players.py: Kept improved nationalities parsing with trim()
- app/settings.py: Kept anti-scraping configuration settings
- app/utils/xpath.py: Combined URL from HEAD with robust NAME fallbacks from main
- Fix import sorting
- Add trailing commas
- Replace single quotes with double quotes
- Add noqa comments for long lines (User-Agent strings, XPath definitions)
- Remove unused variables
- Fix whitespace issues
- Change padding logic for players_joined_on, players_joined, and players_signed_from
- Use "" instead of None to match the default value when elements are None
- Fixes CodeRabbit review: inconsistent placeholder values
- Add try/except for playwright import to handle missing dependency
- Make _browser_scraper optional (None if playwright unavailable)
- Add checks in make_request_with_browser_fallback and get_monitoring_stats
- Update test_browser_scraping endpoint to handle missing playwright
- Add playwright to requirements.txt
- App can now start without playwright, browser scraping disabled if unavailable
- Add playwright install chromium step in Dockerfile
- Only runs if playwright is installed (graceful fallback)
- Ensures browser binaries are available for Railway deployment
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 6, 2025

Warning

Rate limit exceeded

@eskoubar95 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 5b04ef1 and bdca1cc.

📒 Files selected for processing (4)
  • Dockerfile (1 hunks)
  • app/main.py (1 hunks)
  • app/services/base.py (5 hunks)
  • requirements.txt (1 hunks)
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/anti-scraping-optimization

Comment @coderabbitai help to get the list of available commands and usage tips.

- Keep optional playwright import in base.py
- Keep playwright availability check in test_browser_scraping endpoint
- Maintains Railway deployment compatibility
- Keep optional playwright import and checks
- Maintain browser scraper optional initialization
- Preserve playwright availability checks in monitoring stats
- All conflicts resolved, ready for merge
@eskobar95 eskobar95 merged commit fc6632d into main Dec 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant