Skip to content

MinuraSilva/RunningScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RunningScraper

A web scraper for extracting data from online stores selling athletic gear (originally intended only for running gear, hence the name) and indexing in an Elasticsearch database with the purpose of finding discounted products.
Currently works for adidas.ca. End goal is to scrape multiple sites including Reebok, Nike, Footlocker and MEC to provide a centralised store of the best deals on athletic gear.

REST API for retrieving this data can be found in RunningScraperAPI.

How to Run Scraper

scrapy crawl adidas_ca -o adidas.json

  • adidas_ca is the name assigned to the scraper. Remember to set the start URLs (in sports_scraper/spiders/adidas_ca/adidas_ca.py) before running.
  • By default this will dump the data to a json file. To enable indexing into a ElasticSearch database, first make sure ElasticSearch is runnning and then enable ITEM_PIPELINES in sports_scraper/settings.py.

Notes:
Scrapy debugging notes
Scraping websites notes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published