RunningScraper

A web scraper for extracting data from online stores selling athletic gear (originally intended only for running gear, hence the name) and indexing in an Elasticsearch database with the purpose of finding discounted products.
Currently works for adidas.ca. End goal is to scrape multiple sites including Reebok, Nike, Footlocker and MEC to provide a centralised store of the best deals on athletic gear.

REST API for retrieving this data can be found in RunningScraperAPI.

How to Run Scraper

scrapy crawl adidas_ca -o adidas.json

adidas_ca is the name assigned to the scraper. Remember to set the start URLs (in sports_scraper/spiders/adidas_ca/adidas_ca.py) before running.
By default this will dump the data to a json file. To enable indexing into a ElasticSearch database, first make sure ElasticSearch is runnning and then enable ITEM_PIPELINES in sports_scraper/settings.py.

Notes:
Scrapy debugging notes
Scraping websites notes

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
other		other
sports_scraper		sports_scraper
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
scrapy-scraping-notes..md		scrapy-scraping-notes..md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunningScraper

How to Run Scraper

About

Uh oh!

Releases

Packages

Languages

MinuraSilva/RunningScraper

Folders and files

Latest commit

History

Repository files navigation

RunningScraper

How to Run Scraper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages