A command-line interface (CLI) tool to scrape tweets from a specified Twitter user using the twikit library. This tool supports logging in to Twitter and saving session cookies for persistent scraping sessions.
- Login to Twitter and save session cookies.
- Scrape tweets by user ID or screen name.
- Save scraped tweets to a CSV file.
-
Clone the repository:
git clone https://github.com/thxrhmn/twitter-scraper-cli.git cd twitter-scraper-cli -
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up your environment variables:
Create a
.envfile in the project root directory with the following content:USERNAME=your_username EMAIL=your_email PASSWORD=your_password
Before scraping, you need to log in to your Twitter account to save the session cookies. Run the following command:
python main.py loginThis will log in using the credentials stored in the .env file and save the session cookies to cookies.json.
To scrape tweets from a user, run the following command:
python main.py scrape <identifier> <page_limit> [--by_user_id]identifier: The user ID or screen name of the Twitter account.page_limit: The number of additional pages of tweets to fetch.--by_user_id: Optional flag to specify that the identifier is a user ID. If omitted, the identifier is assumed to be a screen name.
-
Scraping by Screen Name:
python main.py scrape elonmusk 100
This command scrapes tweets from the user with the screen name
elonmuskand fetches 100 additional pages of tweets. -
Scraping by User ID:
python main.py scrape 12345678 100 --by_user_id
This command scrapes tweets from the user with the ID
12345678and fetches 100 additional pages of tweets.
The scraped tweets are saved in a CSV file named tweets_<identifier>_<timestamp>.csv, where <identifier> is the user ID or screen name, and <timestamp> is the current date and time.
Each row in the CSV file contains the following fields:
created_at: The timestamp when the tweet was created.text: The text content of the tweet.view_count: The view count of the tweet.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.