Source Watcher Core is the ETL engine of the Source Watcher project. It can be used as a standalone library or as a dependency of Source Watcher API for server-side pipelines.
This is a PHP project that allows extracting, transforming, and loading data from and to different sources including databases, files, and services, while at the same time facilitating the transformation of the data in multiple ways.
- PHP 8.4 or later (see
composer.jsonfor extension requirements: curl, dom, json, pgsql, etc.) - Composer
composer installIf the lock file is not installable on your PHP version, run composer update once. Without local PHP, from the repo root:
docker run --rm -v "$(pwd)":/app -w /app/source-watcher-core composer:2 sh -c "composer update --no-interaction --ignore-platform-reqs".
./vendor/bin/phpunitETL is an abbreviation that stands: for extract, transform, and load. It's a software process used to fill data warehouses with information in three steps:
-
Extract: The process extracts or pulls data from multiple sources.
-
Transform: The incoming data passes through a transformation step.
-
Load: The ETL process will send the data to its final destination.
The foundations of ETL come from data warehousing methodologies dating back to the 1960s. ETL is the process of gathering raw data, like the one from production systems. Once collected, the data is transformed into a more readable, understandable format. The transformed and cleaned data is then loaded into a data repository, usually a relational database but not limited to other types of databases, files, and even REST services among others.
Assume that you have some information in a CSV file, and you want to import the content of the file to a MySQL database.
The CSV file is a standard comma-separated value file with some headers: id, name and email.
You have a table in your MySQL database with some fields: id, name and email_address.
First you would need to extract the information from your CSV file, only the fields that you want to insert into your MySQL database.
After you have extracted the information, you want to rename the CSV header email to match the database field email_address.
Finally, you want to save the information in your database table.
<?php
use Coco\SourceWatcher\Core\Database\Connections\MySqlConnector;
use Coco\SourceWatcher\Core\IO\Inputs\FileInput;
use Coco\SourceWatcher\Core\IO\Outputs\DatabaseOutput;
use Coco\SourceWatcher\Core\SourceWatcher;
use Coco\SourceWatcher\Core\SourceWatcherException;
$mysqlConnector = new MySqlConnector();
$mysqlConnector->setUser( "user" );
$mysqlConnector->setPassword( "password" );
$mysqlConnector->setHost( "host" );
$mysqlConnector->setPort( 3306 );
$mysqlConnector->setDbName( "tests" );
$mysqlConnector->setTableName( "people" );
$sourceWatcher = new SourceWatcher();
try {
$sourceWatcher
->extract( "Csv", new FileInput( __DIR__ . "/../data/csv/csv1.csv" ), [ "columns" => [ "name", "email" ] ] )
->transform( "RenameColumns", [ "columns" => [ "email" => "email_address" ] ] )
->load( "Database", new DatabaseOutput( $mysqlConnector ) )
->run();
} catch ( SourceWatcherException $exception ) {
echo sprintf( "Something unexpected went wrong: %s", $exception->getMessage() );
}A runnable version (requires .env with UNIT_TEST_MYSQL_* and a MySQL people table) is samples/CsvRenameColumnsToMysql.php. More samples are listed in samples/README.md.
Please submit issues, and send your feedback and suggestions as often as you have them.
Jean Paul Ruiz 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!
