Data sources for ParkAPI2
This repository hosts the data sources (downloader, scraper) for the parkendd.de service which lists the number of free spaces of parking lots across Germany and abroad.
The repository for the database and API is ParkAPI2
The scraper.py file is a command-line tool for developing, testing and finally integrating new data sources. It's output is always json formatted.
Each data source is actually called a Pool and usually represents
one website from which lot data is collected.
To view the list of all pool IDs, type:
python scraper.py listTo download and extract data, type:
python scraper.py scrape [-p <pool-id> ...] [--cache]The -p or --pools parameter optionally filters the available sources
by a list of pool IDs.
The optional --cache parameter caches all web requests which is a fair thing to do
during scraper development. If you have old cache files and want to create new ones
then run with --cache write to fire new web requests and write the new files and then
use --cache afterwards.
python scraper.py validate [-mp <max-priority>] [-p <pool-id> ...] [--cache]The validate command validates the resulting snapshot data against the
json schema and prints warnings for fields that should be defined.
Use -mp 0 or --max-priority 0 to only print severe errors and
--max-priority 1 to include warnings about missing data in the most
important fields like latitude, longitude, address and capacity.
Use validate-text to print the data in human-friendly format.
Please feel free to ask questions by opening a new issue.
A data source needs to define a PoolInfo object and
for each parking lot a LotInfo and a LotData object
(defined in util/structs.py).
The python file that defines the source can be placed at
the project root or in a sub-directory and is automatically
detected by scraper.py as long as the util.ScraperBase
class is sub-classed.
An example for scraping an html-based website:
from typing import List
from util import *
class MyCity(ScraperBase):
POOL = PoolInfo(
id="my-city",
name="My City",
public_url="https://www.mycity.de/parken/",
source_url="https://www.mycity.de/parken/auslastung/",
attribution_license="CC-0",
)
def get_lot_data(self) -> List[LotData]:
timestamp = self.now()
soup = self.request_soup(self.POOL.source_url)
lots = []
for div in soup.findall("div", {"class": "special-parking-div"}):
# ... get info from html dom
lots.append(
LotData(
id=name_to_id("mycity", lot_id),
timestamp=timestamp,
lot_timestamp=last_updated,
status=state,
num_occupied=lot_occupied,
capacity=lot_total,
)
)
return lotsThe PoolInfo is a static attribute of the scraper class and
the get_lot_data method must return a list of LotData objects.
It's really basic and does not contain any further information about the
parking lot, only the ID, status, free spaces and total capacity.
Additional lot information is either taken from a
geojson file or the get_lot_infos method
of the scraper class. The scraper.py will merge the LotInfo and
the LotData together to create the final output which must
comply with the json schema.
The geojson file should have the same name as the scraper file,
e.g. example.geojson. If the file exists, it will be used and
it's properties must fit the util.structs.LotInfo object.
If it's not existing, the method get_lot_infos on the scraper
class will be called an should return a list of LotInfo objects.
Some websites do provide most of the required information and it might be easier to scrape it from the web pages instead of writing the geojson file by hand. However, it might not be good practice to scrape this info every other minute. To generate a geojson file from the lot_info data:
# delete the old file if it exists
rm example.geojson
# run `get_lot_infos` and write to geojson
# (and filter for the `example` pool)
python scraper.py write-geojson -p exampleThe command show-geojson will write the contents to stdout for inspection.