Skip to content

jhilly20/GovCon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

60 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Government Opportunity Scrapers

A collection of Python scrapers for monitoring government procurement and research opportunities from various federal agencies. All scrapers integrate with Monday.com for tracking and Slack for notifications.

πŸš€ Features

  • 20+ scrapers covering SAM.gov, SBIR/STTR, OTA consortia, DARPA, DIU, DHS, and more
  • BaseScraper pattern: New scrapers inherit from BaseScraper for consistent Monday.com/Slack integration and deduplication
  • Slack Integration: Automatic notifications when new opportunities are found
  • Monday.com Integration: Auto-create items in Monday.com boards (opportunities board + event dashboard)
  • Environment-based Configuration: Secure API key management via .env
  • Selenium support: Headless browser scraping for JS-heavy and login-required sites

πŸ“‹ Available Scrapers

Opportunity Scrapers (BaseScraper pattern)

These scrapers inherit from BaseScraper and post to the main opportunities Monday.com board.

Scraper Source Method Verified Notes
dod_sbirsttr_scraper.py DoD SBIR/STTR REST API Yes Public API, fetches open/pre-release topics
darpa_scraper.py DARPA RSS feed Yes Parses RSS, extracts deadlines from descriptions
erdcwerx_scraper.py ERDCWERX WordPress API + HTML Yes WP REST API for listing, HTML scraping for deadlines
diu_scraper.py DIU HTML (Nuxt SSR) Yes Server-rendered, no JS needed
grantsgov_scraper.py Grants.gov REST API Yes Filters by for-profit/small-biz eligibility codes
colosseum_scraper.py Colosseum (ONI) HTML Yes Public homepage, no login needed
challenge_gov_scraper.py USA.gov Challenges HTML Yes Detail page enrichment with deadlines, prizes, agencies
dhs_sbir_scraper.py DHS SBIR Selenium No Cloudflare-protected; falls back to sbir.gov
tradewind_scraper.py Tradewind AI Selenium No Wix site, CSS selectors need live validation
vulcan_sof_scraper.py Vulcan SOF Selenium (visible) No Requires login + 2FA; runs non-headless for manual 2FA entry

SAM.gov Scrapers (standalone)

These scrapers predate BaseScraper and have their own Monday.com/Slack integration.

Scraper Source Description
custom_samgov_search.py SAM.gov Custom SAM.gov search template
small_biz_samgov_search.py SAM.gov NAICS 541715 small business set-aside opportunities
industry_day_scraper.py SAM.gov Industry Day events -- posts to Event Dashboard board

Event / Other Scrapers

Scraper Source Description
sda_scraper.py SDA Space Development Agency opportunities
cfic/ CFIC / ARCYBER CyberFIC collaboration events, webinars, and assessments

Note: Not all scrapers have been verified end-to-end with live Monday.com/Slack integration. The "Verified" column above indicates whether the scraper has been tested against the live source and confirmed to fetch/parse data correctly. Selenium-based scrapers in particular need live validation of CSS selectors, which may change when sites update their layouts.

πŸ› οΈ Installation

  1. Clone this repository:
git clone https://github.com/jhilly20/GovCon.git
cd GovCon
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your API keys and configuration

βš™οΈ Configuration

Create a .env file based on .env.example:

Required Environment Variables

# Monday.com (optional - for tracking opportunities)
MONDAY_API_KEY=your_monday_api_key_here
MONDAY_BOARD_ID=your_board_id_here
MONDAY_EVENT_BOARD_ID=your_event_board_id_here  # Event Dashboard board (industry days)

# Slack (optional - for notifications)
SLACK_BOT_TOKEN=xoxb-your-slack-bot-token-here
SLACK_CHANNEL=your_slack_channel_id_here

# SAM.gov (optional - search works without it)
SAM_API_KEY=your_sam_api_key_here

# Vulcan SOF (required for vulcan_sof_scraper only)
VULCAN_SOF_EMAIL=your_email_here
VULCAN_SOF_PASSWORD=your_password_here

# Colosseum credentials (if login required)
COLOSSEUM_EMAIL=your_email_here
COLOSSEUM_PASSWORD=your_password_here

Getting API Keys

🎯 Usage

Command Line Usage

# SAM.gov scrapers
python scrapers/custom_samgov_search.py      # Custom SAM.gov search
python scrapers/small_biz_samgov_search.py   # Small business set-asides
python scrapers/industry_day_scraper.py      # Industry Day events (Event Dashboard)

# BaseScraper-based opportunity scrapers
python scrapers/dod_sbirsttr_scraper.py      # DoD SBIR/STTR topics
python scrapers/darpa_scraper.py             # DARPA opportunities (RSS)
python scrapers/erdcwerx_scraper.py          # ERDCWERX tech challenges
python scrapers/diu_scraper.py               # DIU open solicitations
python scrapers/grantsgov_scraper.py         # Grants.gov (for-profit eligible)
python scrapers/colosseum_scraper.py         # Colosseum / ONI challenges
python scrapers/challenge_gov_scraper.py     # USA.gov challenge competitions

# Selenium-based scrapers (require browser)
python scrapers/dhs_sbir_scraper.py          # DHS SBIR (Cloudflare)
python scrapers/tradewind_scraper.py         # Tradewind AI (Wix)
python scrapers/vulcan_sof_scraper.py        # Vulcan SOF (login + 2FA)

# Other
python scrapers/sda_scraper.py               # Space Development Agency
python -m scrapers.cfic                      # CyberFIC events

πŸ”§ Customization

Custom SAM.gov Search

The custom_samgov_search.py file shows how to create targeted searches. Key parameters to modify:

# Example search parameters
params = {
    "q": "your search terms here",
    "naics": "541715",  # NAICS code for your industry
    "set_aside": "SBP,SBA",  # Small business set-asides
    "notice_type": "p"  # Presolicitations only
}

Small Business Set-Aside Search

The small_biz_samgov_search.py is specifically configured for:

  • NAICS 541715: Computer Systems Design Services
  • Set-asides: Small Business (SBP) and SBA programs
  • Custom Slack labeling: "small biz setaside 541715 sam.gov"

Industry Day Events

The industry_day_scraper.py searches for:

  • Industry Day events: Government-hosted industry days and conferences
  • Search term: "Industry Day" on SAM.gov
  • Notice type: Special notices (type "s")
  • Event Dashboard board: Posts to MONDAY_EVENT_BOARD_ID (separate from the opportunities board)
  • Deduplication: By solicitation number to prevent recreating existing items
  • Detail enrichment: Fetches v2 detail endpoint for authoritative links and topic numbers

CyberFIC Events

The cfic/ package scrapes upcoming events from CyberFIC.org:

  • Collaboration Events (CE): In-person events with purpose/synopsis, RSVP deadlines, PDF releases
  • Assessment Events (AE): Targeted problem events with desirements
  • Connector Series Webinars: Virtual speaker series with key takeaways
  • Q & A Sessions: Pre-submission Q&A with ARCYBER stakeholders
  • Automatically follows detail page links to collect full event information
  • Syncs to Monday.com and sends Slack notifications for new events

Custom Monday.com Integration

If you want to use Monday.com, update the column mappings in each scraper's config section:

# Monday.com column mappings
TITLE_COLUMN = "your_title_column_id"
DESCRIPTION_COLUMN = "your_description_column_id"
URL_COLUMN = "your_url_column_id"
DEADLINE_COLUMN = "your_deadline_column_id"
AGENCY_COLUMN = "your_agency_column_id"

πŸ“Š Output

Each scraper returns a list of opportunity dictionaries with the following structure:

{
    "title": "Opportunity Title",
    "description": "Full description",
    "url": "Direct link to opportunity",
    "deadline": "YYYY-MM-DD",
    "agency": "Agency Name",
    "posted_date": "YYYY-MM-DD"
}

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-scraper
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“… Planned / In Progress

The following scrapers and integrations are planned for future development:

Upcoming Scrapers (Medium Priority)

Source Status Notes
MITRE AiDA OTA Consortia In progress Per-consortium opportunity parsing; prepend consortium name to titles
CyberFIC Done Already implemented in cfic/
ICWERX Planned
NASA SBIR/STTR Planned
DOE SBIR Planned
ConnectWERX Planned
EnergyWERX Planned
HSWERX Planned

Upcoming Scrapers (Low Priority)

Source Status Notes
NAM Consortium Planned
TechConnect Planned WordPress REST API available
ARL DEVCOM Planned
NSPIRES (NASA) Planned
ARPA-E Planned
EERE Exchange Planned
DOE PAMS Planned
DHS Forecast Planned
Volpe DOT SBIR Planned Cloudflare-protected
ARPA-I Planned
NIST SBIR Planned
NOAA SBIR Planned

Calendar Integrations (Lower Priority)

Future integration with event calendars for automatic syncing:

πŸ“ Notes

  • These scrapers are provided for educational and research purposes
  • Always respect website terms of service and rate limits
  • Some sites may require additional authentication or have anti-scraping measures
  • Consider adding delays between requests to avoid overwhelming servers
  • Not all scrapers have been verified end-to-end yet. API-based scrapers (DoD SBIR, DARPA, ERDCWERX, DIU, Grants.gov, Colosseum) have been tested against live sources. Selenium-based scrapers (DHS SBIR, Tradewind, Vulcan SOF) need live validation of CSS selectors.

⚠️ Disclaimer

This software is not affiliated with any government agency. Users are responsible for ensuring compliance with all applicable laws and terms of service.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages