A comprehensive command-line tool for scraping various sections of Hacker News, built with Go and the Colly web scraping framework.
- Front Page - Scrape the main Hacker News front page
- News - Scrape the latest news stories
- Ask HN - Scrape Ask Hacker News posts
- Show HN - Scrape Show Hacker News posts
- Jobs - Scrape job postings
- New Comments - Scrape the latest comments across the site
- Fetch Ask Comments - Get detailed comments for specific Ask HN posts
- User Submissions - View all posts submitted by a specific user
- User Threads - View all comments made by a specific user
- User Favorites - View posts favorited by a specific user
- User Info - Get detailed profile information for any user
- Clone the repository:
git clone https://github.com/pimatis/hckscrp.git
cd hckscrp- Install dependencies:
go mod tidy- Build the application:
go build -o hckscrp main.goRun the application:
./hckscrpor if you prefer to run it with go run:
go run main.goYou'll be presented with an interactive menu:
Welcome to HCKSCRP - Hacker News Scraper
Available commands:
1. Front Page
2. News
3. Ask
4. Show
5. Jobs
6. New Comments
7. Fetch Ask Comments
8. User Submissions
9. User Threads
10. User Favorites
11. User Info
12. Exit
-----------------------------------------
Enter command (1-12):
- Select options 1-6 and enter a page number when prompted
- Page numbers start from 1 (default)
- Select option 7 and enter a specific item ID
- Example:
44178902for a specific Ask HN post
- Select options 8-11 and enter a username when prompted
- For submissions, threads, and favorites, you can also specify a page number
- Example username:
queaxtra
All scraped data is displayed in formatted tables with relevant columns:
- Rank
- Title (truncated to 50 characters)
- Domain/URL
- Score (when available)
- Author
- Time posted
- Comment count
- Comment ID
- Author
- Time posted
- Content (truncated for readability)
- Story context
- Username
- Account creation date
- Karma score
- About section (if available)
- Pagination Support: All relevant scrapers support multiple pages
- Automatic URL Handling: Properly formats both external and internal Hacker News links
- Error Handling: Graceful handling of network errors and missing data
- Clean Output: Formatted tables with appropriate column widths and text truncation
- Interactive Menu: Easy-to-use command-line interface
Please be respectful when using this scraper:
- Don't make too many rapid requests
- Consider adding delays between requests for heavy usage
- Follow Hacker News' robots.txt and terms of service
Feel free to submit issues and enhancement requests!
This project is open source and available under the MIT License.
Created by Pimatis Labs