Appartment scraper Argentina
- Ingest the data into the database through an airflow DAG.
- Add more webpages.
- Paralelize the process (1 thread per page).
- Create a process to match properties between different web pages and detect which is the best option if there is any.
- Create an API with auth to return the data required.
- Create ci/cd process.
- Migrate all the process to a cloud provider.
- Small analysis and dashboard.
Execute the following lines
mkdir -p ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)" > .env
docker-compose up airflow-init
To start the development environment execute:
docker-compose up
In other terminal you can look which containers are running executing:
docker ps
This will initialize the main folders, create the .env file with config parameters and initiate airflow environment and username. Default: user airflow password airflow
In case you want to revert the changes, you have to run:
docker-compose down --volumes --remove-orphans
rm -rf ./dags ./logs ./plugins
docker exec -it containername /bin/bash airflow users create --username name --firstname name --lastname name --role role --email email Then type the password
To create a DAG insert it into 'dags' folder
There are two options to test a DAG:
- Go to the UI (localhost:8080) and execute it there
- Search for the docker-container ID and write docker exec <container_id> bash. Then execute DAGs from command line.