Toward a standard approach for change data capture (CDC) for OpenMRS

@akimaina, @achachiez, @wyclif, and @mseaton (and anyone else who has used debezium with OpenMRS),

@mseaton and I have been trying to build momentum & consensus around shared tooling & an approach for data analytics with OpenMRS. The ultimate target is to have data analytics capability “out of the box” with OpenMRS that can deliver useful analytics (both for reporting and for EMR needs like decision support, analytics-informed EMR features, sync, etc.) on a single machine and easily scale up to distributed or a cloud-based solution when needed & available. We’ve broken data analytics into four areas:1

  1. Change Data Capture (CDC) - streaming data from OpenMRS
  2. Data processing pipelines and Extract Transform Load (ETL) – converting the data into usable forms
  3. Analytics Data Store – persisting data in reporting- and analytic-friendly formats
  4. Reporting and Business Intelligence (BI) tools – tooling to visualize data and render reports

We think a good place to start would be with CDC. The de facto standard within OpenMRS appears to be debezium, so we’d like to bring together folks with experience using debezium and OpenMRS and see (1) if everyone already using the same approach and (2) if there are variations in approaches, if we can identify a best practice.

Would you all be willing to join either an upcoming Analytics Engine Squad call (Thursdays 2pm UTC) or a TAC call (Mondays 4pm UTC) to share your experience with debezium & OpenMRS?

/cc @mksd @jdick @grace @ibacher @dkayiwa

  1. OpenMRS Reporting Strategy 2022

@aojwang, I know you’ve been a leader in the space of ETL with OpenMRS. Have you or Palladium had experience with connecting debezium to OpenMRS?

@achachiez, I had been led to believe @wyclif was doing debezium-related work at Mekom, but @mksd pointed out I should have directed my initial post to you (for Mekom). Cheers, -Burke

Unfortunately not. I was part of the initial discussions but I dropped off at some point.

For sure debezium sounds like the way forward, OpenMRS-EIP provides a framework that allows one to write a custom application that reacts to changes in an OpenMRS database but this custom application would need to run outside of an OpenMRS instance. I can imagine some implementations might prefer to actually have this framework provided by a module which would also allow us to achieve more things as listed below,

  • Revisit the event module to depend on a debezium backed module
  • Create a more robust and reliable DB sync module
  • Load and and incrementally stream data from OpenMRS into a target sink to address different requirements e.g feed and update an SHR, MPI, data warehouse etc.

Thanks @wyclif. While installing a module is the easiest path for implementations to add new functionality, we’d like to migrate away from building (and owning/maintaining) an entire stack when there are better solutions that already exist (like debezium). What we’re imagining is a pre-configured debezium that could be brought up alongside OpenMRS to make CDC as close to an “out of the box feature” as possible, realizing that a database-level event stream could be helpful in some cases but most cases will want a domain-level event stream (e.g., “this patient has been changed”).

OpenMRS-EIP seems very close to this and maybe what we’re talking about is a dockerized OpenMRS-EIP. There’s some decent documentation on the architecture in openmrs-eip repo. Is there an overview or intro video or slide deck for OpenMRS-EIP that would show how it’s set up – e.g., is it assumed that debezium is already running & configured?

@mseaton do you have any experience with OpenMRS-EIP?

Exactly! The thought is having an “out of the box” CDC configuration (even if it requires a “docker-compose up” alongside OpenMRS) with minimal (hardware & learning curve) requirements to get started could provide a de facto standard for streaming data from OpenMRS for the purposes you describe as well as feeding data/services back to the EMR (e.g., derived concept calculations, decision support, smarter cohort queries, etc.).

@burke my idea of possibly having a debezium based module would be extra credit, it would still be using debezium anyways, and am not saying we should not implement an externalized stack outside of OpenMRS.

By the way, OpenMRS-EIP provides a database level stream of events and and not a domain level one.