This is a guide for getting started with Vergil, Directory, and Handshake Data. The links to the repos of the scrapers can be found below.
- Clone the repository and move into it:
$ git clone git@github.com:NewsroomDevelopment/scraper-examples.git
$ cd scraper-examples
- Create a
.envfile with the contents below. (See this Google Doc for the MongoDB user credentials.) Make sure.envis always listed in your.gitignorefile.
# MongoDB credentials MDB_USERNAME=USERNAME MDB_PASSWORD=PASSWORD
-
Follow this tutorial to set up the aws credentials needed for some of the scrapers.
-
If you're using Python: Run
pipenv installto install the necessary packages. Runpipenv shellto launch the virtual environment and get access to those packages. If you do not havepipenvdopip install pipenv. If you do not havepiplook it up. -
In the shell, run do
python -m ipykernel install --user --name=scraper-kernel -
Open up jupyter notebook and change the kernel by going to kernel -> change kernel -> scraper-kernel.
Open up jupyter notebook and test out the scrapers!