A Python web scraping project that extracts data from websites and processes it using pandas and BeautifulSoup.
- Web scraping using
requestsandBeautifulSoup - Data processing with
pandas - CSV export functionality
- Virtual environment setup
- Python 3.x
- Virtual environment (venv)
-
Clone the repository (if applicable):
git clone <repository-url> cd web_scrapping
-
Create and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On macOS/Linux # or venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
pandas- Data manipulation and analysisrequests- HTTP library for making requestsbeautifulsoup4- HTML/XML parsing library
-
Activate the virtual environment:
source venv/bin/activate -
Run the web scraping script:
python web_scrapping.py
-
Deactivate when done:
deactivate
web_scrapping/
├── venv/ # Virtual environment
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
├── README.md # This file
└── web_scrapping.py # Main scraping script
The following files are automatically ignored by git:
*.csv- Data filesvenv/- Virtual environment__pycache__/- Python cache.DS_Store- macOS system files- Various temporary and IDE files
- Always activate the virtual environment before running the script
- The script generates CSV files that are automatically ignored by git
- Use
./venv/bin/python web_scrapping.pyas an alternative to activating the environment
- Fork the repository
- Create a feature branch
- Make your changes
- Commit and push
- Create a pull request
This project is open source and available under the MIT License.