Dump followers and follows DIDs for Bluesky users to Parquet files.
A command-line tool that retrieves follower and following relationships from Bluesky (AT Protocol) and exports them to Apache Parquet files for analysis.
- Retrieve follows (users that a given account follows)
- Retrieve followers (users that follow a given account)
- Output to Parquet format for efficient storage and querying
- Configurable page size, buffer size, and retry logic
- Progress tracking with progress bars
- Rate limiting to prevent API ban
- Typed error handling with smart retry logic (retries only on rate limit errors)
# Set up environment variables
cp .env.example .env
# Edit .env with your BSKY_LOGIN and BSKY_PASSWORD
# Fetch follows (default)
cargo run -- --input-file users.txt --output-dir ./output
# Fetch followers instead
cargo run -- --input-file users.txt --output-dir ./output --follower| Argument | Short | Default | Description |
|---|---|---|---|
--input-file |
-i |
(required) | Input file containing list of Bluesky DIDs |
--limit |
-l |
100 | Page size for Bluesky API requests |
--buf-size |
-b |
100 | Buffer size for Parquet writing |
--output-dir |
-o |
./output |
Output directory for Parquet files |
--log-file |
-w |
bsky-graph.log |
Log file path |
--max-retry |
-m |
10 | Maximum number of retries before failing |
--follower |
-f |
false | Fetch followers instead of follows |
- Rust 2024 edition
- Bluesky account credentials
- Added
--followerflag to choose between fetching followers or follows - Enhanced command-line interface with better descriptions
- Fixed rate limiter: unified to 5 requests/second for both followers and follows
- Added info logging to indicate which type is being fetched
- Added
thiserrordependency for typed error handling - Created
GetGraphErrorenum with variants for rate limiting, bad requests, login failures, and unexpected errors - Improved error handling in
get_followsandget_followerwith smart retry logic (only retry on rate limit errors) - Adjusted rate limits: 5 requests/second for follows, 3 requests/second for followers
- Changed rate limit from 600 to 3 requests/second
- Added rate limiting using
governorcrate (600 requests/second) to prevent API bans - Removed manual 100ms delay between requests
- Improved error handling in parquet writer
- Initial release
This project is licensed under the MIT License - see the LICENSE file for details.
Vincent Gauthier vincent.gauthier@telecom-sudparis.eu - Telecom SudParis