How distributed OpenMRS deployments are synchronized?

Tags: #<Tag:0x00007fa3ed0b1790> #<Tag:0x00007fa3ed0b1678> #<Tag:0x00007fa3ed0b1588>

Hi everyone,

I am wondering when people have a distributed OpenMRS deployment, e.g., OpenMRS used in several clinics in a region serving the same population segment, how do they handle the sync problem?

I am aware of Sync 2.0 module but I have been told this is not used in production (is this correct?). From this thread it seems some folks have tried it and reported issues but not much progress has happened after that. I have also seen dbsync but again I don’t know where/how it is used, specially when it comes to conflict resolution and scale.

In general, syncing multiple MySQL DBs where all accept updates (i.e., not a master-slave scenario) is a non-trivial problem, e.g., when it comes to resolving conflicts, so I am curios how distributed deployments solve this problem.

Any information on implemented OpenMRS distributed architectures is appreciated.

CC: @alalo, @ssmusoke, @mksd

There have been a number of synchronization efforts for OpenMRS over the years, including:

  • PIH made a sophisticated synchronization solution many years ago that worked well for them and handled fancy bi-directional sync at the API level & dealt with intermittent connectivity, but required a lot of expertise to maintain and did not get widely adopted.
  • Bahmni uses an atom feed module to publish resource changes and synchronize them to other servers. This has worked well for Bahmni and comes with its own pros & cons as an approach.
  • Some sites use MySQL replication or bespoke approaches to synchronization
  • SolDevelo led an effort to build a “sync2” module a couple years ago trying to repeat the success of PIH’s solution with a new solution based on Bahmni’s approach that would be usable by non-Bahmni implementations, but we weren’t able to get implementations invested in partnering with Soldevelo and I don’t believe synch2 managed to reach production.
  • AMPATH has tried various forms of synchronization, but opted to focus on robust connectivity and a beefy server in order to have all sites using a central server and obviate the need for synchronization.
  • Mekom Solutions has created dbsync as an alternative approach to synchronization of systems.

FYI – I believe om.rs/radar includes some opinions on sync, but we don’t have a generic solution anyone can use (yet). :slightly_smiling_face:

2 Likes

Hi @bashir,

With DB Sync we currently default to “new is the best” should one use it in a mesh with multidirectional syncs.

Happy to run you through it with @wyclif.

1 Like

Adding some more on this. We are definitely hoping for DB Sync to become a generic solution to support both pure OpenMRS-OpenMRS sync and OpenMRS-third party integration endeavours.

First and foremost it provides a reusable OpenMRS Camel component, that’s for leaving the door wide open to EIP.

The first production use of DB Sync will start in the summer to achieve a one way sync from multiple field instances to a central “HQ” instance.

Cc @frederic.deniger

3 Likes

Thanks @burke for this summary; this was very useful.

Thanks @mksd for the info.

For the one-way scenario that you mentioned, what happens when data is missing from field instances (but present in other instances/HQ), e.g., a patient is registered at site A but that same patient later goes to site B? Is there any “cache-miss resolution” type of strategy to fall back to HQ?

Also is it fair to say that in all sync scenarios, there is at least one DB instance which will have all data items (e.g., HQ in your first production case)? Or will there be cases that no instance necessarily has all data (i.e., data is distributed between instances)?

FYI, I am mostly thinking about the sync problem and how the conflicts are being resolved in the context of streaming ETL/ELT/Analytics solutions (this thread) where multiple update streams might be needed to be merged. It is quite possible that I am overthinking this problem right now.

One other question @mksd: What were the original reasons for deciding to create a new sync approach; is there a doc/thread that I can take a look? I understand the values of Camel integration but also wonder if there are any fundamental issues with atom-feed based approaches (or more generally Hibernate interceptors) which seems to be the option some other OpenMRS use cases, with similar needs (listening to DB changes), have taken.

Again asking this in the context of incremental ETL/ELT.

Field instances do not sync between each other. Eventually HQ will push down data on a subscription basis. I would imagine topics to which instances subscribe based on their own criteria (such as “give me only data for my country/region”), but that is not done or even decided at all and is out of scope right now.

If patients are duplicated, so for example if a patient, that is in fact the same person, is registered at multiple field sites, then they will be merged (over and over again) at HQ based on their patient identifiers. I am saying “over and over again”, because each time new patient data is synced upwards from any of the field instances where the patient is registered, a duplicate (re)appears again at HQ that might have been merged already in the past. That’s because syncing downwards and merging patients at the field instances is currently out of scope.

In our case yes, and that is HQ.

@bashir the decision not to use Sync 2.0 was based on a number of early findings that made it a deal breaker for the stakeholder at the time of the assessment (April 2019).

A couple of people who have looked into it might chime in (@wyclif, @willa, …) . One of the major issues found with Sync 2.0 back then is that it did not cater properly for disorder. It was, apparently, possible to quickly bump into issues such as “I can’t load that obs because it refers to an encounter that I don’t know (yet) about.” This very kind of issue was of high concern and led to redesigning a tool that would fully support this from the outset, enforcing so-called “eventual consistency” at the receiving end of the data, whatever the route in between (such as remote ActiveMQ nodes for instance).

Another point that was raised by @raff at the time was that it was probably a good idea to work at the database level 1) performance wise and 2) because the DB schema in OpenMRS was actually quite stable. One of the advantages of APIs is that they hedge against DB schema changes, and in the end this was not such a good argument when looking at OpenMRS’ history. Performance however was rather the key factor. Early in the days of DbSync we could show that the large test dataset could be synced entirely in less than four hours with end-to-end encryption turned on.

For EIP purposes there was indeed a strong push for Camel, which still makes sense. Keep in mind as well that the solution needed to cover both 1) sync and 2) integration needs. It is intended to be used to sync patient files “from field to HQ” as well as to integrate OpenMRS with other systems locally (such as an ERP.)

I realise that it is a pity that the assessment wasn’t made more public a year ago. I’m trying to recollect the info for you here above. However even in apha/beta stages, DbSync has started to show promising results quite fast, which has reinforced our take that this was probably an acceptable approach and appropriate tech stack, even though we do revisit some assumptions still as of now (cfr the other thread).

Happy to speak about it any time.

3 Likes

Thanks @mksd this was very useful information.

This sounds like a big limitation, I thought the way that atomfeed is implemented through Hibernate interceptors, guarantees that such out of order message sequences will not happen.

My understanding was that atomfeed too is used both for sync and integration with other systems. For example I think Bahmni uses an atomfeed based approach for OpenERP integration.

It seems the same core problem is to have a way to listen to OpenMRS data changes while the downstream use case is different (e.g., sync, ERP integration, ETL, etc.) and it seems there are multiple solutions developed for this same problem. It would be very nice if the OpenMRS community can adopt one and that same solution is used in all scenarios.

@bashir I would also add that there’s the Event module that also offers a way to propagate messages from Hibernate interceptors to JMS queues. I never really understood why the Atom Feed module was also introduced, rather than developing the Event module further.

When it comes to integrating OpenMRS with other systems, I don’t know that we can always enforce a common vision (and tech stack) that everyone would adopt. Depending on what people do with OpenMRS there will always idiosyncrasies when it comes to EIP.

Don’t get me wrong however, I’m excited to see that a number of groups are starting to find alignments :slight_smile:

The Atom Feed module relies on use of the Event Module and REST Module. The Atomfeed module hooks into events in openmrs via the Event module. The atomfeed module provides links back to the object changed as webservices.rest urls.

1 Like

Actually, the Atom Feed module doesn’t use the Event module at all. Instead, it has it’s own Hibernate interceptor. From Jira, this appears to have been an initial stop-gap as the original implementors seemed unsure of how stable the event module API was, and the stop-gap was never removed. The two interceptors are basically identical except that the one in the Atom Feed fires events before they are committed to the database.

1 Like

Thanks @ibacher for digging that out. We should accordingly update the wiki documentation: https://wiki.openmrs.org/display/docs/Atom+Feed+Module

Adding to the mix, is the other different atom feed module used by bahmni: https://bahmni.atlassian.net/wiki/spaces/BAH/pages/3506200/Atom+Feed+Based+Synchronization+in+Bahmni

1 Like

Atom feed module doesn’t use event module, If I recall well, I think in atomfeed they wanted to respond to events within the same thread whereas event module publishes the events to a message broker so you respond to them in a different thread.

I agree that maybe this cannot be enforced, but as you pointed out, it would be very nice if we can compare/discuss all these solutions for basically the same problem, try to make some consensus/recommendations, and update all wiki pages/repos with a note about those recommendations.

Just to give more context about my own experience (as a new developer in the OpenMRS community), when I started to look at this problem a while ago, first I learnt about the Sync module, then Sync 2.0, then realized that it is based on the Atomfeed module which in its Wiki page it claimed it is based on the Event module (which is not true as others have pointed out but I learnt it the hard way). Then I realized that Bahmni uses another (slightly different?) atomfeed based approach and also learnt about your work on dbsync, so it has been an “interesting” (but not very efficient) journey :slight_smile:

I think both OpenMRS’s Atomfeed module and Bahmni’s are based on the same ICT4H atomfeed implementation. According to Sync 2.0 Wiki page, Bahmni’s is a fork but I have not looked at the details of that.

Yes - there is a decent talk thread describing this and also some useful ticket history to view around this here. There is also a ticket in the Atom Feed module to remove this duplicate functionality here.

Fundamentally, the issue is that there are use cases where we want to detect when something happens and react to it within the same transaction, and there are use cases where we want to only publish/subscribe if the transaction has completed successfully to ensure that messages are not sent off about transactions that are ultimately rolled back. We never got to a point (from what I recall) where the event module served both of these use cases.

Mike

1 Like