This project is a web-based tool built using Streamlit and BeautifulSoup that allows users to extract specific data (tables, headings, or rows) from webpages. The tool enables quick preview, visualization, and CSV export of the scraped data.
- Input any webpage URL
- Select the type of data to extract: full table, headings, or specific rows
- Visualize the extracted data using basic charts
- Export the data to a CSV file
- Simple and user-friendly interface
git clone https://github.com/yourusername/ai-web-scraper.git
cd ai-web-scraperOn Windows:
python -m venv venv
venv\Scripts\activateOn macOS/Linux:
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtstreamlit run app.pyapp/
├── app.py # Main Streamlit app
├── scraper.py # Contains scraping logic
├── interpreter.py # AI prompt → selector logic (optional)
├── utils.py # CSV export, error handling
├── requirements.txt
If using an OpenAI API key or other environment variables, create .streamlit/secrets.toml like this:
[openai]
api_key = "your-openai-api-key"Access it in your code as:
st.secrets["openai"]["api_key"]- streamlit
- pandas
- beautifulsoup4
- lxml
- matplotlib
You can deploy this project on Streamlit Cloud. Just upload your code and add your secrets.toml in the Settings → Secrets section of your app.
Let me know if you’d like badges, Docker setup, or example screenshots added.