Help & Documentation
Contents
Requirements
- Docker with Docker Compose
- PHP 8.4 (provided by the web container)
- MariaDB 12.1.2+ (provided by the db container)
Installation Steps
- Clone the MyETL repository:
git clone https://github.com/philboiv1n/myetl - Navigate into the new `myetl` directory:
cd myetl - Create your environment configuration files from the examples:
cp example.env .env
cp private_myetl/example.env.php private_myetl/env.php -
Edit both
.envandprivate_myetl/env.phpwith your database credentials.
Note: The credentials in both files must match. - Run
docker compose up --build -d. Dependencies are installed automatically. - Run the database installation script in the container:
docker compose exec web php private_myetl/db_install.php
MariaDB
MariaDB is the default database. The Docker Compose file pins the image to:
# docker-compose.yml
services:
# ... other services
db:
image: mariadb:12.1.2
# ... rest of the db service config
Important: If you are upgrading across MariaDB major versions, remove the old database volume to avoid data corruption. Run docker compose down -v before running docker compose up --build with the new MariaDB image. This will delete all existing data.
Scheduler
The scheduler service runs private_myetl/jobs.php every 15 minutes.
- The cron schedule is defined in
scripts/cron/myetl. - After changing the schedule, rebuild the scheduler image with
docker compose up -d --build scheduler.
Project Structure
A brief overview of the main directories and files in the project.
/(Project Root)composer.json: Defines PHP dependencies and autoloading.Dockerfile: Builds the web container image.Dockerfile.scheduler: Builds the scheduler container image.docker-compose.yml: Docker configuration for development and deployment.scripts/: Scheduler configuration and utilities.cron/myetl: Cron schedule forprivate_myetl/jobs.php.
README.md: Project overview and setup instructions.vendor/: Composer-managed dependencies.- `/private_myetl`: Contains all backend logic. This directory should NOT be publicly accessible.
Controllers/: API controllers that handle requests.jobs/: The ETL job definitions, separated into `extract/` and `transform/`.downloads/: Stores raw and archived data files.helpers/: Core ETL helper functions.jobs.php: The main job dispatcher, run by the scheduler.db_install.php: Script to create the database and tables.schema.sql: The database schema.example.env.php/env.php: Environment configuration (DB credentials, etc.).
- `/public_html`: The public-facing front-end. This is your web server's document root.
api/: The REST API endpoints.assets/: CSS, JavaScript libraries, and images.pages/: The main pages of the application (Home, Help, etc.).
Dependency Management
This project uses Composer to manage PHP dependencies. Dependencies are installed automatically when the containers build.
Caching
To improve performance and reduce redundant API calls, this project implements PSR-6 caching for all external requests made with the Guzzle HTTP client.
- The cache duration (Time To Live) is configurable in your
private_myetl/env.phpfile via theapi_cache_ttl_secondskey. - It defaults to 1 hour (3600 seconds) if not specified.
- The cache is stored on the filesystem in the
private_myetl/cache/directory.
Standards Compliance
This project aims to follow modern PHP best practices by adhering to the following PHP Standards Recommendations (PSR):
- PSR-3: Logger Interface (Monolog)
- PSR-4: Autoloader (Composer)
- PSR-6: Caching Interface (Symfony Cache)
- PSR-7: HTTP Message Interface (for API)
- PSR-11: Container Interface (PHP-DI)
- PSR-12: Extended Coding Style (PHP-CS-Fixer)
- PSR-15: HTTP Handlers (League/Route)
- PSR-17: HTTP Factories (Guzzle PSR-7)
- PSR-18: HTTP Client (Guzzle)
These standards make the code more interoperable, readable, and maintainable.
Deployment
Deploy with Docker Compose on your server:
- Clone or pull the repository on the server.
- Create or update
.envandprivate_myetl/env.phpwith your database credentials. - Build and start the containers:
docker compose up -d --build - If this is a new database, run:
docker compose exec web php private_myetl/db_install.php
Important Notes
- Ensure that the
/private_myetldirectory is located outside of the publicly accessiblepublic_htmldirectory.
Once the installation is complete:
- Visit
http://localhost:9090for the main app andhttp://localhost:9091for Adminer. - If you change ports in
docker-compose.yml, update these URLs accordingly.
Maintenance
How to add a new data feed?
Adding a new data feed involves three main steps:
- Update the Database Schema: Add or modify a table in
private_myetl/schema.sqlto store the new data, then rundocker compose exec web php private_myetl/db_install.phpto apply the changes. - Create an Extract Job: Create a PHP script in
private_myetl/jobs/extract/to download the data. This job needs aschedulekey with a standard cron expression (e.g.,'5 * * * *'for 5 past every hour) and arunfunction. - Create a Transform Job: Create a PHP script in
private_myetl/jobs/transform/to parse the raw data and load it into the database. This job also needs aschedulekey and arunfunction. - Create a Display Page: Once the data is in the database, follow the steps in the next section to create a page to visualize it.
Transform Job Configuration
Your transform job will need a configuration array with the following keys:
raw_dir: Directory where the extract job saved the file.archive_dir: Where to move the file after processing.raw_file_pattern: A pattern (e.g.,'*.json') to find the raw file(s).db_table: The name of the database table you created.- For JSON files:
json_pointer: (For simple, flat JSON) A JSON Pointer to the array of records.field_map: Maps source keys (using dot notation for nested objects) to database columns.custom_parser: (For complex JSON) A function that gives you full control to parse the file and return an array of records. This is used for the MLB roster and standings feeds.
- For CSV files:
csv_delimiter: The delimiter character (e.g.,',').field_map: Maps CSV header names to database columns.pre_sql: (Optional) An SQL command to run before processing, likeTRUNCATE TABLE ....
Both field_map configurations allow you to use a function for a specific field if you need to transform the data, like combining fields or changing a date format.
'field_map' => [
'db_column_name' => 'source_key.nested_key', // Direct mapping
'full_name' => function($record) { // Custom logic
return $record['first_name'] . ' ' . $record['last_name'];
},
],
How to add a display page?
Follow these steps to add a new visualization or data table to the application:
- Create an API Controller: In
private_myetl/Controllers/, create a new controller class (e.g.,MyNewDataController.php). It should have a method that queries the database and returns the data as a plain PHP array. The router will automatically handle JSON conversion. - Register the API Route: Open
private_myetl/bootstrap.phpand add a new route to the router configuration. For example:$router->get('/api/v1/my-new-data', [\MyETL\Controllers\MyNewDataController::class, 'index']); - Create the Frontend Page: In
public_html/pages/(or a subdirectory liketemplates/), create a new PHP file for your page. This file will contain the HTML structure and the JavaScript (e.g., using Chart.js or AG Grid) needed to fetch data from your new API endpoint and render the visualization. - Add a Navigation Link: Finally, add a link to your new page in
public_html/partials/footer.phpso users can access it.
Versioning
This project uses semantic versioning. The version number is managed in composer.json. During composer install/composer update (including container builds), a script generates private_myetl/Generated/Version.php, which is read directly at runtime to display in the footer.
Credits & Acknowledgements
This project was made possible by the following open-source software and services:
Backend Libraries
- Composer - Dependency Management
- Cron Expression - Cron for PHP
- ProxyManager LTS - Lazy-loading proxies for PHP-DI
- Guzzle - PSR-18 HTTP Client
- Guzzle Cache Middleware - PSR-6 caching for Guzzle
- Guzzle PSR-7 - PSR-7 HTTP Message Implementation
- JSON Machine - Memory-efficient JSON parsing
- Laminas Emitter - PSR-7 Response Emitter
- Monolog - PSR-3 Logging
- PHP-DI - PSR-11 Dependency Injection Container
- Symfony Cache - PSR-6 Cache Implementation