Request for Input: Migrating MambaETL to the OpenMRS Community Repositories

amugume · April 22, 2024, 5:34pm

Dear Community,

We’re reaching out for your insights regarding an upcoming transition involving the MambaETL codebase. Currently housed in UCSF-managed repositories, we’re preparing to migrate it into the OpenMRS community repositories.

What is MambaETL?

MambaETL, or simply Mamba, serves as an implementation for data Extraction, Loading, and Transformation (ETL) within the Open Electronic Medical Records System (OpenMRS) ecosystem. Its primary function is to transform highly normalised data, such as Observation (Obs) data, into a more denormalised format, facilitating faster data retrieval and analysis.

Our Approach:

The core functionality of MambaETL, comprised of a set of bash scripts and stored procedures/functions, is packaged as an OpenMRS module, featuring only the API submodule. This design allows implementers to seamlessly integrate MambaETL functionality into their projects by including it as a dependency. Notably, this modular approach enables us to easily enhance and maintain the core library independently, without being entangled in implementer-specific details. Separating the specifics also encourages implementers not to fork the Mamba repository.

Key Considerations:

Compile-Time Dependencies: MambaETL relies on properties passed at compile time, such as target analysis database specifications. This information is crucial for generating Liquibase changeset files dynamically during the build process.
Auto-Generated Build Files: During compilation, an ETL (SQL) file is auto-generated, containing both core scripts and any implementer-specific additions. This file is deployed by Liquibase at runtime.

Proposal:

We currently maintain the MambaETL core library module separate from any implementation specific details. It only contains logic and files that are core to the functioning of MambaETL.`

All core updates and enhancements to MambaETL go into this module and any dependent modules can choose to upgrade their version of MambaETL core seamlessly.

We have also created a second module which is essentially a MambaETL Reference/quick start module, which demonstrates how Mamba core can be integrated into other projects.

An implementer can choose to use the MambaETL Reference module as is and build ontop with their specific needs or just use it as a guide to setup a separate module to support MambaETL.` Below are the repository links:

MambaETL Core Module: Link
MambaETL Reference/Quick Start Module: Link

Community Input:

Considering the outlined approach and key considerations, we seek your perspective on whether to maintain the current modular structure or consolidate into a single module that encompasses both core functionality and implementer-specific extensions. Your feedback and suggestions are highly valued as we strive to optimise the MambaETL framework for the broader OpenMRS community.

Thank you for your time and contributions.

cc: @dkayiwa @eudson @dbaluku @slubwama @gomare @gkinyua

dkayiwa · April 22, 2024, 5:45pm

@amugume can you point us to the implementer specific extensions in the second mamba-etl module?

amugume · April 23, 2024, 5:30am

@dkayiwa thank you for your prompt response.

In addition to the automated database flattening feature inherent in MambaETL, there are several customizable functionalities that enhance its versatility available to implementers:

Users have the capability to tailor configurations, specifying details such as the desired structure of the final flat tables, including table name, columns, and encounter type. This flexibility is facilitated through the configuration section.
Through the reports.json file, implementers can integrate report queries and input parameters, empowering them to refine data retrieval methods from the analysis_db according to their specific needs.
For more intricate reporting requirements, users can supplement MambaETL with their own ETL scripts, housed within the derived folder. This advanced feature enables precise control over report generation directly from the flat tables.
The parent pom.xml file also has some customisable features.

While comprehensive documentation detailing these functionalities among others is available within the module, it’s worth noting that these sections have been deliberately omitted from the module itself to maintain its generic nature as a foundational starting point.

Below, you’ll find screenshots illustrating some of the customisable implementer-specific extensions referenced above.

figure 1: shows config folder contents under the ../resources/_etl folder in the omod submodule

figure 2: shows one of the flat table configuration files for the ANC encounter type

figure 3: shows a cut-out section of the contents of the reports.json file

figure 4: shows one of the contents of the folders under the derived folder

dkayiwa · April 23, 2024, 6:37am

If mamba-etl is for implementer specific customisations, then for those who just want the default without code customisations(and by the way they are the majority), why can’t they have that in mamba-core alone? To give an example, i can run the openmrs reporting module with non code code customisations to create reports. But implementers like UgandaEMR which want code customisations, they extend this reporting module with something like ugandaemrreports. In summary, an implementer without a need for code customisations, who just wants the default flattening of obs tables, why cant they just use the mamba-core?