Building a Modern ETL / Analytics Pipeline for OpenMRS — Feedback Wanted

Hi OpenMRS Community,

My name is Ashar Ali, and I’m a Data Engineer exploring ways to build a modern ETL/analytics pipeline for OpenMRS.

From my research, I understand that while OpenMRS captures rich clinical data, there isn’t currently a fully-featured, production-grade pipeline that can:

  1. Extract data from OpenMRS databases safely

  2. Transform/clean/normalize the data for analytics

  3. Load it into a warehouse or analytics-ready schema

  4. Support monitoring, logging, and scheduling for repeatable runs

I also studied the existing MambaETL module and see it as a great reference, but it seems limited in orchestration, monitoring, and multi-hospital support.

I want to develop a pipeline that is modular, secure, and usable by hospitals, even with local deployments. Before starting, I’d love to get feedback from the community:

  • Are these the main challenges hospitals face regarding analytics and reporting?

  • Are there specific analytics patterns, KPIs, or reports that would be most valuable?

  • Would hospitals be open to testing such a pipeline with demo or synthetic data first?

Any insights, suggestions, or guidance from implementers, developers, or hospital IT staff would be highly appreciated. I want to make sure the solution addresses real-world needs.

Thank you for your time and guidance!

Did you get a chance to look at this? Real-Time ETL for OpenMRS Using Apache Flink: Proof of Concept & Feedback Request

@itz_ashar_ali Here is the paper I wrote on this: Real-Time ETL for EAV-Modeled Clinical Data: A CDC-Based Approach Using OpenMRS | IEEE Conference Publication | IEEE Xplore