Skip to content

TheCocoTeam/source-watcher-core

Repository files navigation

Source Watcher Core

codecov

scrutinizer-ci

All Contributors

Source Watcher Core is the ETL engine of the Source Watcher project. It can be used as a standalone library or as a dependency of Source Watcher API for server-side pipelines.

This is a PHP project that allows extracting, transforming, and loading data from and to different sources including databases, files, and services, while at the same time facilitating the transformation of the data in multiple ways.

Requirements

  • PHP 8.4 or later (see composer.json for extension requirements: curl, dom, json, pgsql, etc.)
  • Composer

Installation

composer install

If the lock file is not installable on your PHP version, run composer update once. Without local PHP, from the repo root:
docker run --rm -v "$(pwd)":/app -w /app/source-watcher-core composer:2 sh -c "composer update --no-interaction --ignore-platform-reqs".

Running tests

./vendor/bin/phpunit

What is ETL?

ETL is an abbreviation that stands: for extract, transform, and load. It's a software process used to fill data warehouses with information in three steps:

  • Extract: The process extracts or pulls data from multiple sources.

  • Transform: The incoming data passes through a transformation step.

  • Load: The ETL process will send the data to its final destination.

The foundations of ETL come from data warehousing methodologies dating back to the 1960s. ETL is the process of gathering raw data, like the one from production systems. Once collected, the data is transformed into a more readable, understandable format. The transformed and cleaned data is then loaded into a data repository, usually a relational database but not limited to other types of databases, files, and even REST services among others.

ETL example

Assume that you have some information in a CSV file, and you want to import the content of the file to a MySQL database.

The CSV file is a standard comma-separated value file with some headers: id, name and email.

You have a table in your MySQL database with some fields: id, name and email_address.

First you would need to extract the information from your CSV file, only the fields that you want to insert into your MySQL database.

After you have extracted the information, you want to rename the CSV header email to match the database field email_address.

Finally, you want to save the information in your database table.

<?php
use Coco\SourceWatcher\Core\Database\Connections\MySqlConnector;
use Coco\SourceWatcher\Core\IO\Inputs\FileInput;
use Coco\SourceWatcher\Core\IO\Outputs\DatabaseOutput;
use Coco\SourceWatcher\Core\SourceWatcher;
use Coco\SourceWatcher\Core\SourceWatcherException;

$mysqlConnector = new MySqlConnector();
$mysqlConnector->setUser( "user" );
$mysqlConnector->setPassword( "password" );
$mysqlConnector->setHost( "host" );
$mysqlConnector->setPort( 3306 );
$mysqlConnector->setDbName( "tests" );
$mysqlConnector->setTableName( "people" );

$sourceWatcher = new SourceWatcher();

try {
    $sourceWatcher
        ->extract( "Csv", new FileInput( __DIR__ . "/../data/csv/csv1.csv" ), [ "columns" => [ "name", "email" ] ] )
        ->transform( "RenameColumns", [ "columns" => [ "email" => "email_address" ] ] )
        ->load( "Database", new DatabaseOutput( $mysqlConnector ) )
        ->run();
} catch ( SourceWatcherException $exception ) {
    echo sprintf( "Something unexpected went wrong: %s", $exception->getMessage() );
}

A runnable version (requires .env with UNIT_TEST_MYSQL_* and a MySQL people table) is samples/CsvRenameColumnsToMysql.php. More samples are listed in samples/README.md.

Feedback

Please submit issues, and send your feedback and suggestions as often as you have them.

Contributors


Jean Paul Ruiz

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

About

PHP ETL library: pipeline of extractors, transformers, and loaders (CSV/JSON/DB, etc.) run via a fluent API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors