Web Scraping Hockey Statistics

Overview

This project demonstrates a basic web‑scraping workflow using Python to extract tabular hockey statistics from a web page, transform the data into a structured format, and persist the results to a CSV file for downstream analysis.

The implementation is provided as a Jupyter Notebook (webscraping.ipynb) and is intended for instructional and exploratory use.

Features

Fetches and parses an HTML table containing hockey statistics
Extracts table rows and cells using BeautifulSoup
Cleans and normalizes text values
Stores the extracted data in a Pandas DataFrame
Exports the dataset to a CSV file (Hockey.csv)

Technology Stack

Python 3.x
Jupyter Notebook
Requests (for HTTP requests)
BeautifulSoup (bs4) (for HTML parsing)
Pandas (for data manipulation and storage)

Prerequisites

Ensure the following packages are installed in your Python environment:

pip install requests beautifulsoup4 pandas

If you are using Jupyter:

pip install notebook

Project Structure

.
├── webscraping.ipynb   # Main notebook containing the scraping logic
├── Hockey.csv          # Output file generated by the notebook
└── README.md           # Project documentation

How It Works

A target web page containing a hockey statistics table is requested.
The HTML content is parsed using BeautifulSoup.
The relevant <table> element is located.
Each table row (<tr>) is iterated over and cell values (<td>) are extracted.
Extracted text is cleaned using .strip().
Each row is appended to a Pandas DataFrame.
The DataFrame is saved to a CSV file in the project directory.

Usage

Open the notebook:
```
jupyter notebook webscraping.ipynb
```
Run all cells in sequence.
After execution, a file named Hockey.csv will be created in the current directory.

Output

The output CSV file contains one row per team (or record) and one column per statistic, exactly as scraped from the source table.

Notes and Limitations

The scraper depends on the structure of the target website. Any HTML changes may require code updates.
This project does not include rate‑limiting or advanced error handling.
Always review and comply with the target website’s robots.txt and terms of service before scraping.

Possible Enhancements

Add column headers explicitly for clarity
Implement exception handling for network and parsing errors
Parameterize the target URL
Add logging instead of print() statements
Package the logic into reusable functions or a module

License

This project is provided for educational purposes. No warranty is expressed or implied.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
webscraping.ipynb		webscraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Hockey Statistics

Overview

Features

Technology Stack

Prerequisites

Project Structure

How It Works

Usage

Output

Notes and Limitations

Possible Enhancements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Hockey Statistics

Overview

Features

Technology Stack

Prerequisites

Project Structure

How It Works

Usage

Output

Notes and Limitations

Possible Enhancements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages