Help & Documentation
Contents
Requirements
- Web server running PHP 8.2
- MySQL 8.4 or MariaDB 11.8+
- Job scheduler (e.g., cron) for automated data processing
- Optional : Docker
Installation Steps
- Clone the MyETL repository:
git clone https://github.com/philboiv1n/myetl
- Navigate into the new `myetl` directory:
cd myetl
- Create your environment configuration files from the examples:
cp example.env .env
cp private_myetl/example.env.php private_myetl/env.php
-
Edit both
.env
(for Docker) andprivate_myetl/env.php
(for the application) with your database credentials.
Note: When using Docker, the credentials in both files must match. - If using Docker:
- Run
docker compose up --build
. Dependencies are installed automatically. - Run the database installation script in the container:
docker compose exec web php private_myetl/db_install.php
- Run
- If setting up manually:
- Install dependencies:
composer install
- Create the database and tables:
php private_myetl/db_install.php
- Install dependencies:
- Set up a cron job to run
jobs.php
. All output and errors are automatically logged toprivate_myetl/logs/app.log
. Example:
# Run the job dispatcher every 15 minutes
*/15 * * * * /usr/local/bin/php /home/username/private_myetl/jobs.php
Using MariaDB with Docker
To use MariaDB instead of MySQL, simply change the database image in your docker-compose.yml
file:
# docker-compose.yml
services:
# ... other services
db:
# image: mysql:8.4
image: mariadb:11.8
# ... rest of the db service config
Important: If you are switching from an existing MySQL setup, you must first remove the old database volume to avoid data corruption. Run docker compose down -v
before running docker compose up --build
with the new MariaDB image. This will delete all existing data.
The application is fully compatible with MariaDB.
Project Structure
A brief overview of the main directories and files in the project.
/
(Project Root)composer.json
: Defines PHP dependencies and autoloading.docker-compose.yml
: Docker configuration for local development.README.md
: Project overview and setup instructions.vendor/
: Composer-managed dependencies.- `/private_myetl`: Contains all backend logic. This directory should NOT be publicly accessible.
Controllers/
: API controllers that handle requests.jobs/
: The ETL job definitions, separated into `extract/` and `transform/`.downloads/
: Stores raw and archived data files.helpers/
: Core ETL helper functions.jobs.php
: The main job dispatcher, run by cron.db_install.php
: Script to create the database and tables.schema.sql
: The database schema.example.env.php
/env.php
: Environment configuration (DB credentials, etc.).
- `/public_html/myetl`: The public-facing front-end. This is your web server's document root.
api/
: The REST API endpoints.assets/
: CSS, JavaScript libraries, and images.pages/
: The main pages of the application (Home, Help, etc.).
Dependency Management
This project uses Composer to manage PHP dependencies. After cloning the project, you need to install these dependencies.
With Docker
If you are using the provided Docker environment, dependencies are installed automatically when the container starts.
With a Local PHP Environment
If you have PHP and Composer installed on your local machine, navigate to the project root and run: composer install
Caching
To improve performance and reduce redundant API calls, this project implements PSR-6 caching for all external requests made with the Guzzle HTTP client.
- The cache duration (Time To Live) is configurable in your
private_myetl/env.php
file via theapi_cache_ttl_seconds
key. - It defaults to 1 hour (3600 seconds) if not specified.
- The cache is stored on the filesystem in the
private_myetl/cache/
directory.
Standards Compliance
This project aims to follow modern PHP best practices by adhering to the following PHP Standards Recommendations (PSR):
- PSR-3: Logger Interface (Monolog)
- PSR-4: Autoloader (Composer)
- PSR-6: Caching Interface (Symfony Cache)
- PSR-7: HTTP Message Interface (for API)
- PSR-11: Container Interface (PHP-DI)
- PSR-12: Extended Coding Style (PHP-CS-Fixer)
- PSR-15: HTTP Handlers (League/Route)
- PSR-17: HTTP Factories (Guzzle PSR-7)
- PSR-18: HTTP Client (Guzzle)
These standards make the code more interoperable, readable, and maintainable.
Deployment
There are two common strategies for deploying this application to a production environment.
Strategy 1: Build on Server (Recommended)
This is the most reliable method and avoids autoloader path issues. It requires SSH access to your server.
- On your server, pull the latest changes from your Git repository.
- Navigate to the project root directory and run the Composer install command. This will build the dependencies and generate files with paths correct for the server environment.
composer install --no-dev --optimize-autoloader
Strategy 2: Build Locally, then Upload
Use this method for shared hosting or environments where you don't have command-line access.
- On your local machine, run the following command to prepare the project for production.
composer install --no-dev --optimize-autoloader
- Using an FTP/SFTP client, upload all project files to your server. You must include the generated directories:
vendor/
private_myetl/Generated/
Note: This method can sometimes cause autoloader issues if the server environment is significantly different from your local machine.
Important Notes
- Ensure that the
/private_myetl
directory is located outside of the publicly accessiblepublic_html
directory.
Once the installation is complete:
- If using Docker: visit
http://localhost:8080
for the main app andhttp://localhost:8081
for Adminer. - If using shared hosting: point your domain to the
public_html/
directory (or follow your host’s instructions).
Maintenance
How to add a new data feed?
Adding a new data feed involves three main steps:
- Update the Database Schema: Add or modify a table in
private_myetl/schema.sql
to store the new data, then runphp private_myetl/db_install.php
to apply the changes. - Create an Extract Job: Create a PHP script in
private_myetl/jobs/extract/
to download the data. This job needs aschedule
key with a standard cron expression (e.g.,'5 * * * *'
for 5 past every hour) and arun
function. - Create a Transform Job: Create a PHP script in
private_myetl/jobs/transform/
to parse the raw data and load it into the database. This job also needs aschedule
key and arun
function. - Create a Display Page: Once the data is in the database, follow the steps in the next section to create a page to visualize it.
Transform Job Configuration
Your transform job will need a configuration array with the following keys:
raw_dir
: Directory where the extract job saved the file.archive_dir
: Where to move the file after processing.raw_file_pattern
: A pattern (e.g.,'*.json'
) to find the raw file(s).db_table
: The name of the database table you created.- For JSON files:
json_pointer
: (For simple, flat JSON) A JSON Pointer to the array of records.field_map
: Maps source keys (using dot notation for nested objects) to database columns.custom_parser
: (For complex JSON) A function that gives you full control to parse the file and return an array of records. This is used for the MLB roster and standings feeds.
- For CSV files:
csv_delimiter
: The delimiter character (e.g.,','
).field_map
: Maps CSV header names to database columns.pre_sql
: (Optional) An SQL command to run before processing, likeTRUNCATE TABLE ...
.
Both field_map
configurations allow you to use a function for a specific field if you need to transform the data, like combining fields or changing a date format.
'field_map' => [
'db_column_name' => 'source_key.nested_key', // Direct mapping
'full_name' => function($record) { // Custom logic
return $record['first_name'] . ' ' . $record['last_name'];
},
],
How to add a display page?
Follow these steps to add a new visualization or data table to the application:
- Create an API Controller: In
private_myetl/Controllers/
, create a new controller class (e.g.,MyNewDataController.php
). It should have a method that queries the database and returns the data as a plain PHP array. The router will automatically handle JSON conversion. - Register the API Route: Open
private_myetl/bootstrap.php
and add a new route to the router configuration. For example:$router->get('/api/v1/my-new-data', [\MyETL\Controllers\MyNewDataController::class, 'index']);
- Create the Frontend Page: In
public_html/myetl/pages/
(or a subdirectory liketemplates/
), create a new PHP file for your page. This file will contain the HTML structure and the JavaScript (e.g., using Chart.js or AG Grid) needed to fetch data from your new API endpoint and render the visualization. - Add a Navigation Link: Finally, add a link to your new page in
public_html/myetl/partials/footer.php
so users can access it.
Versioning
This project uses semantic versioning. The version number is managed in composer.json
. During the composer install
or composer update
process, a script generates a version file (private_myetl/Generated/Version.php
) that is read directly at runtime to display in the footer.
Credits & Acknowledgements
This project was made possible by the following open-source software and services:
Backend Libraries
- Composer - Dependency Management
- Cron Expression - Cron for PHP
- ProxyManager LTS - Lazy-loading proxies for PHP-DI
- Guzzle - PSR-18 HTTP Client
- Guzzle Cache Middleware - PSR-6 caching for Guzzle
- Guzzle PSR-7 - PSR-7 HTTP Message Implementation
- JSON Machine - Memory-efficient JSON parsing
- Laminas Emitter - PSR-7 Response Emitter
- Monolog - PSR-3 Logging
- PHP-DI - PSR-11 Dependency Injection Container
- Symfony Cache - PSR-6 Cache Implementation