This site is best viewed on a desktop. Mobile layout is not fully optimized.

Help & Documentation


Requirements


Installation Steps

  1. Clone the MyETL repository: git clone https://github.com/philboiv1n/myetl
  2. Navigate into the new `myetl` directory: cd myetl
  3. Create your environment configuration files from the examples:
    cp example.env .env
    cp private_myetl/example.env.php private_myetl/env.php
  4. Edit both .env (for Docker) and private_myetl/env.php (for the application) with your database credentials.
    Note: When using Docker, the credentials in both files must match.
  5. If using Docker:
    • Run docker compose up --build. Dependencies are installed automatically.
    • Run the database installation script in the container: docker compose exec web php private_myetl/db_install.php
  6. If setting up manually:
    • Install dependencies: composer install
    • Create the database and tables: php private_myetl/db_install.php
  7. Set up a cron job to run jobs.php. All output and errors are automatically logged to private_myetl/logs/app.log. Example:
    # Run the job dispatcher every 15 minutes
    */15 * * * * /usr/local/bin/php /home/username/private_myetl/jobs.php

Using MariaDB with Docker

To use MariaDB instead of MySQL, simply change the database image in your docker-compose.yml file:

# docker-compose.yml
services:
  # ... other services
  db:
    # image: mysql:8.4
    image: mariadb:11.8
    # ... rest of the db service config

Important: If you are switching from an existing MySQL setup, you must first remove the old database volume to avoid data corruption. Run docker compose down -v before running docker compose up --build with the new MariaDB image. This will delete all existing data.

The application is fully compatible with MariaDB.


Project Structure

A brief overview of the main directories and files in the project.

  • / (Project Root)
    • composer.json: Defines PHP dependencies and autoloading.
    • docker-compose.yml: Docker configuration for local development.
    • README.md: Project overview and setup instructions.
    • vendor/: Composer-managed dependencies.
    • `/private_myetl`: Contains all backend logic. This directory should NOT be publicly accessible.
      • Controllers/: API controllers that handle requests.
      • jobs/: The ETL job definitions, separated into `extract/` and `transform/`.
      • downloads/: Stores raw and archived data files.
      • helpers/: Core ETL helper functions.
      • jobs.php: The main job dispatcher, run by cron.
      • db_install.php: Script to create the database and tables.
      • schema.sql: The database schema.
      • example.env.php / env.php: Environment configuration (DB credentials, etc.).
    • `/public_html/myetl`: The public-facing front-end. This is your web server's document root.
      • api/: The REST API endpoints.
      • assets/: CSS, JavaScript libraries, and images.
      • pages/: The main pages of the application (Home, Help, etc.).

Dependency Management

This project uses Composer to manage PHP dependencies. After cloning the project, you need to install these dependencies.

With Docker

If you are using the provided Docker environment, dependencies are installed automatically when the container starts.

With a Local PHP Environment

If you have PHP and Composer installed on your local machine, navigate to the project root and run: composer install


Caching

To improve performance and reduce redundant API calls, this project implements PSR-6 caching for all external requests made with the Guzzle HTTP client.

  • The cache duration (Time To Live) is configurable in your private_myetl/env.php file via the api_cache_ttl_seconds key.
  • It defaults to 1 hour (3600 seconds) if not specified.
  • The cache is stored on the filesystem in the private_myetl/cache/ directory.

Standards Compliance

This project aims to follow modern PHP best practices by adhering to the following PHP Standards Recommendations (PSR):

  • PSR-3: Logger Interface (Monolog)
  • PSR-4: Autoloader (Composer)
  • PSR-6: Caching Interface (Symfony Cache)
  • PSR-7: HTTP Message Interface (for API)
  • PSR-11: Container Interface (PHP-DI)
  • PSR-12: Extended Coding Style (PHP-CS-Fixer)
  • PSR-15: HTTP Handlers (League/Route)
  • PSR-17: HTTP Factories (Guzzle PSR-7)
  • PSR-18: HTTP Client (Guzzle)

These standards make the code more interoperable, readable, and maintainable.


Deployment

There are two common strategies for deploying this application to a production environment.

Strategy 1: Build on Server (Recommended)

This is the most reliable method and avoids autoloader path issues. It requires SSH access to your server.

  1. On your server, pull the latest changes from your Git repository.
  2. Navigate to the project root directory and run the Composer install command. This will build the dependencies and generate files with paths correct for the server environment.
    composer install --no-dev --optimize-autoloader

Strategy 2: Build Locally, then Upload

Use this method for shared hosting or environments where you don't have command-line access.

  1. On your local machine, run the following command to prepare the project for production.
    composer install --no-dev --optimize-autoloader
  2. Using an FTP/SFTP client, upload all project files to your server. You must include the generated directories:
    • vendor/
    • private_myetl/Generated/

Note: This method can sometimes cause autoloader issues if the server environment is significantly different from your local machine.


Important Notes

  • Ensure that the /private_myetl directory is located outside of the publicly accessible public_html directory.

Once the installation is complete:

  • If using Docker: visit http://localhost:8080 for the main app and http://localhost:8081 for Adminer.
  • If using shared hosting: point your domain to the public_html/ directory (or follow your host’s instructions).

Maintenance

How to add a new data feed?

Adding a new data feed involves three main steps:

  1. Update the Database Schema: Add or modify a table in private_myetl/schema.sql to store the new data, then run php private_myetl/db_install.php to apply the changes.
  2. Create an Extract Job: Create a PHP script in private_myetl/jobs/extract/ to download the data. This job needs a schedule key with a standard cron expression (e.g., '5 * * * *' for 5 past every hour) and a run function.
  3. Create a Transform Job: Create a PHP script in private_myetl/jobs/transform/ to parse the raw data and load it into the database. This job also needs a schedule key and a run function.
  4. Create a Display Page: Once the data is in the database, follow the steps in the next section to create a page to visualize it.

Transform Job Configuration

Your transform job will need a configuration array with the following keys:

  • raw_dir: Directory where the extract job saved the file.
  • archive_dir: Where to move the file after processing.
  • raw_file_pattern: A pattern (e.g., '*.json') to find the raw file(s).
  • db_table: The name of the database table you created.
  • For JSON files:
    • json_pointer: (For simple, flat JSON) A JSON Pointer to the array of records.
    • field_map: Maps source keys (using dot notation for nested objects) to database columns.
    • custom_parser: (For complex JSON) A function that gives you full control to parse the file and return an array of records. This is used for the MLB roster and standings feeds.
  • For CSV files:
    • csv_delimiter: The delimiter character (e.g., ',').
    • field_map: Maps CSV header names to database columns.
    • pre_sql: (Optional) An SQL command to run before processing, like TRUNCATE TABLE ....

Both field_map configurations allow you to use a function for a specific field if you need to transform the data, like combining fields or changing a date format.

'field_map' => [
    'db_column_name' => 'source_key.nested_key', // Direct mapping
    'full_name'      => function($record) {      // Custom logic
        return $record['first_name'] . ' ' . $record['last_name'];
    },
],

How to add a display page?

Follow these steps to add a new visualization or data table to the application:

  1. Create an API Controller: In private_myetl/Controllers/, create a new controller class (e.g., MyNewDataController.php). It should have a method that queries the database and returns the data as a plain PHP array. The router will automatically handle JSON conversion.
  2. Register the API Route: Open private_myetl/bootstrap.php and add a new route to the router configuration. For example:
    $router->get('/api/v1/my-new-data', [\MyETL\Controllers\MyNewDataController::class, 'index']);
  3. Create the Frontend Page: In public_html/myetl/pages/ (or a subdirectory like templates/), create a new PHP file for your page. This file will contain the HTML structure and the JavaScript (e.g., using Chart.js or AG Grid) needed to fetch data from your new API endpoint and render the visualization.
  4. Add a Navigation Link: Finally, add a link to your new page in public_html/myetl/partials/footer.php so users can access it.

Versioning

This project uses semantic versioning. The version number is managed in composer.json. During the composer install or composer update process, a script generates a version file (private_myetl/Generated/Version.php) that is read directly at runtime to display in the footer.

Credits & Acknowledgements

This project was made possible by the following open-source software and services:

Backend Libraries

Frontend Libraries

Platforms & Tools


Need Help?

Please contact me to report any issues or if you have questions.