aivenpy

aivenpy is a python-based site monitoring system that uses Aiven for PostgreSQL and Aiven for Apache Kafka to:

Monitor any website
Check - using regular expression - if a particular item is present on the site
Publish website health stats (response_code, response_time etc) to Kafka
Receive stats from Kafka and insert to database

Before proceeding futher [Update as of 5.May 2021]

The system as of now is entirely automated. The kafka producer is scheduled to run every 10 minutes. And the kafka consumer had been scheduled to start with the release tag v.x.x (see https://github.com/briwoto/aivenpy/actions). So, if you plan to run this project on your local machine, go to Github actions and stop the release build so that you can stop the consumer in the pipeline and then start on your local

Framework

For development: I utilized the open-source libraries instead of using full-fledged frameworks like Django or flask. And focused on code-clarity and clean-architecture

For testing: I used pytest framework

How to run the project

Pre-requisites

For running through Docker

If you simply want to run the project without knowing about the mechanics of it, installing docker would be sufficient

For manual run on your local

If you want to run the code yourself on your local machine, you would need the following pre-installed on your system :

python 3.7 or higher
venv or any other virtual environment for python

Common pre-requisites for both the methods

Regardless of the above two methods, you would need the values of the following environment variables:

- AVDATABASE
- AVHOST
- AVPASSWORD
- AVPORT
- AVUSER
- AV_BASE_URL
- AV_KFUSER
- AV_KFPASSWORD
- AV_KFPORT
- BOOTSTRAP_SERVER
- CA_PEM

Run aivenpy

Step 1: Clone repo

First you need to clone the repo to your local
- Clone with SSH:
```
git@github.com:briwoto/aivenpy.git
```
  OR
- Clone with HTTPS:
```
https://github.com/briwoto/aivenpy.git
```
Step 2: Setting envionment variables

NOTE: This step below for setting environment variables, is for Mac/Linux. If you are using Windows then, please google the steps to set the environment variables in Windows

If you need the env vars to run the project, feel free to Contact Me
Important

The authentication method used for Kafka is SASL_SSL PLAIN. So you need a pem file in addition to the credentials (notice the CA_PEM variable mentioned in pre-requisites).

Aiven kafka provides a ca file, which you have to convert to a pem file and save the value of the file in an environment variable
1. First, you have to convert the crt to a pem file. You can use this link for referernce:
  
  How to get .pem file from .key and .crt files?
2. Once you have the .pem file, open the file in any text editor and copy the contents and paste is as the value of the CA_PEM environment variable. You can use this linke for reference:
  
  How to Export a Multi-line Environment Variable in Bash/Terminal e.g: RSA Private Key

Running through Docker

If you want to run the project via Method A i.e. docker, follow these steps:

Build image

Run the command:
```
docker build -t consumer .
```

Run container

Add values to the corresponding env vars below and run the command

docker run -it \
-e AVDATABASE="" \
-e AVHOST="" \
-e AVPORT="" \
-e AVUSER="" \
-e AVPASSWORD="" \
-e AV_BASE_URL="" \
-e AV_KFUSER="" \
-e AV_KFPASSWORD="" \
-e AV_KFPORT="" \
-e BOOTSTRAP_SERVER="" \
-e CA_PEM="" \
consumer

Running code through python

If you want to run the project via Method B i.e. directly through python, first make sure that all the pre-requisites mentioned above, are fulfilled. Then, follow these steps:

Copy env vars to bashrc

Open a terminal tab/window and run the command
```
sudo nano /etc/bashrc
```
The terminal might ask for a password. If it does, type the password and press Enter. Once the bashrc file opens, export all the environment variables with the correct values

Example:
```
export AVUSER=dummyname
export AVPASSWORd=whateveryourpasswordis
...
...
```
Save the bashrc file and open a new terminal window
Create and activate a virtual environment

Run this command to create a virtual environment
```
python3 -m venv venv
```
Then, run this command to activate the virtual environment
```
source venv/bin/activate
```
Install dependencies

Run this command to install all dependencies
```
pip install -r requirements.txt
```
Run site monitor & kafka producer

If you want to get fresh stats from the websites and publish to the kafka producer, run:
```
make monitor_producer
```
If you want to run kafka consumer, run:
```
make consumer
```

Project Architecture

Details

The project consists of 3 layers:

The Interactive layer says "what" to do

The Business layer decides "how" to do it

The Service layer decides "where" to go to get/update data

1. Interactive layer

Interactive layer is the entry point of the project. This is the layer that:

Interacts with the site monitor
Tells kafka producer (business layer) to send data to Kafka (service layer)
Starts the consumer
Send messages from consumer to the database

Any interaction with a business layer component, is done at this layer.

Any interaction between two business layer components, is also done at this layer

2. Business layer

This layer is the backbone of the project. All logic, data extraction, manipulation happens at this layer. The interactive layer tells what to do. The business layer decides "how" to do it

3. Service layer

This layer is nothing, but the connection with in-house/third-party services to get/update what we need.

Code flow

There are two separate executions

The site monitor and kafka producer are coupled into monitor_producer.py.
The kafka consumer is executed through consumer.py

Site-monitor and kafka producer

When you direcly run

python monitor_producer.py

the code for site-monitor and producer is executed. The sequence of code execution is as follows:

First, aiven.py calls the config module. The pem file is updated here. Also the logger library, which is used in all the programs to log info/warning to the console, is initiated
Then, aiven calls the monitor module to get the stats of the target website.
1. First, all the sites (whose stats we want) are fetched from the sites table
2. Then, stats are collected for each website
3. Other than checking the stats, the code also checks for a regular expression to verify whether a particular item is present in the response.text
4. Upon receiving the stats and the regexp boolean value, aiven.py sends these stats to the producer

Start kafka consumer

When you want to start the consumer, the consumer.py needs to be run. You may simply run the make command to start the consumer:

make consumer

Once kafka consumer connection is set, the messages are received by consumer.py
Each message - when received - is then formatted via consumer_queries module
Once formatted, the data is then inserted in the database

The 'talker' module

With the intent of keeping the architecture clean and isolating each layer, the talker module was created. Any interaction that needs to happen with the database, happens via the talker module.

Github actions pipeline

Instead of looping with a while statement to get stats periodically, I set up a github actions pipeline.

As of now, the site monitor is completely automated. The stats are collected every 10 minutes via the github actions pipeline. This means that, every 10 minutes:

A request is sent to the target website
The collected stats are sent to Kafka

Visit the below link to view the status of each build:

https://github.com/briwoto/aivenpy/actions

Advantages of the project implementation

Although I can't claim the project to be "100% clean", the focus while writing the code was towards clean architecture. For instance:
- talker.py module focuses on the part that: The database service should be a plug-and-play. Suppose in the future, we decide to move from Postgres to any other database or move from on-prem to cloud services, we should only need to create the new connections without touching any of the existing code
- Every package at every layer is basically an app in itself. This provides an opportunity to enhance the project where producer | consumer | database : all sit in different systems

Tests

The test suite is divided into 3 test files:

test_producer
test_db
test_site_monitor

The tests are very basic at this point. The coverage is at a unit-test level. Because of lack of time and the sake of completion, I could not write detailed integration tests

If I had more time

I would like to spin-off the project into different sub-projects, to "completely" isolate each app
Bug

The logger logs the same output twice, in the console. Couldn't get time to fix this. I confirmed, however, that the data in the database is not getting duplicated. The issue is only with the logger function
Seggregate the talker module into sub-modules; where one module would be responsible to format the data and another module would be responsible to send this data to postgres.py db layer
Include a broader set of tests if I could spend more time on the project
Implement outbox:

The idea here is to
- Take the stats and insert in the outbox table. Also add topic_name to this data, and a boolean flag that tells whether the data has been published to kafka or not
- Create a common producer app that runs periodically: fetches the unpublished data from outbox and sends the data to kafka
- Finally, update the boolean flag in the outbox table to indicate that data has been sent to kafka
This approach is highly scalable when we have multiple apps and each app has its own kafka topic to send data to. Following the outbox approach, there won't be a need for each team to implement a code for kafka producer. Since the outbox table will contain a column called topic_name, we can have one single producer that (a) picks up data from outbox and sends it to kafka; and (b) updates the boolean flag for that row in the table, to indicate that data has been sent

Contact

If you would like to talk more about the project and its architecture - or if you face any issues while running it - feel free to contact me (email id mentioned below)

With best regards,

Rahul Singh

rahul.beck@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
aiven		aiven
apps		apps
db		db
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
architecture.png		architecture.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aivenpy

Contents

Framework

How to run the project

Pre-requisites

For running through Docker

For manual run on your local

Common pre-requisites for both the methods

Run aivenpy

Step 1: Clone repo

Step 2: Setting envionment variables

Important

Running through Docker

Running code through python

Project Architecture

Details

1. Interactive layer

2. Business layer

3. Service layer

Code flow

Site-monitor and kafka producer

Start kafka consumer

The 'talker' module

Github actions pipeline

Advantages of the project implementation

Tests

If I had more time

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aivenpy

Contents

Framework

How to run the project

Pre-requisites

For running through Docker

For manual run on your local

Common pre-requisites for both the methods

Run aivenpy

Step 1: Clone repo

Step 2: Setting envionment variables

Important

Running through Docker

Running code through python

Project Architecture

Details

1. Interactive layer

2. Business layer

3. Service layer

Code flow

Site-monitor and kafka producer

Start kafka consumer

The 'talker' module

Github actions pipeline

Advantages of the project implementation

Tests

If I had more time

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages