Sync 2.0 Planning

Thanks @ayeung and @aramirez - we are looking forward to working with FGH Mozambique & Global Brigades during the development of Sync 2.0. Your testing and input on the design/development of Sync 2.0 will definitely help us build a solution that will meet the needs of the community.

@raff and @jthomas - I agree that the next step will be documenting on the wiki (in the Sync space) and organizing a design call to discuss further.

I would also consider the option that although this will be a replacement for the sync module, we might choose to give it a new name. And since the codebase will be 100% new there is no clear value to doing this in the old sync moduleā€™s git repo.

-Darius (by phone)

(thought I posted this in response to Mike, but just seeing now that I didnā€™tā€¦ )

Since it is going to be a completely new code base, doesnā€™t it make sense to start a new repo? I guess from a naming convention it does make it easier. Is there any kind of best practice around this type of situation?

That being said, if the general consensus is to use the same repo, Iā€™m fine with thatā€¦

Mark

Iā€™d also vote for having the new version in a new repo since the codebase is really completely different, what would be the module id though?

Angular 1 and 2 are in two different repos:

Jackson JSON 1 is in svn and 2 is in github:

OpenLMIS created a new set of repos when they did a complete refactor/rewrite (I advised on this):

So, it seems like large-scale code rewrites prefer to create new repos. (E.g. you would never want a developer to do a git pull and inadvertently go from sync 1.x to sync 2.x.)

I recommend that we delay on deciding the module id and the exact naming/marketing as we do design, figure out what different independent components are involved, what might overlap with a connect-to-MPI/HIE project, etc. (We can continue to call the project Sync 2.0 in the interim.)

KEMRI-SEARCH and KEMRI-FACES are extensive users of sync 1.x version and I would be happy to be part of the testing team.

Erick

FYI @carl I want to make sure youā€™re informed about this conversation because youā€™re doing so much work with HIE and FHIR.

Thanks @werick, we will keep you in the loop.

I donā€™t see an issue with doing a new git repo - it has some advantages over branching.

Did we decide on using the existing Jira and Wiki? I noticed that there are about 50 unresolved issues for sync - doesnā€™t necessarily mean we need a new project, we can just use the version fields to keep track of that.

If we are using the existing sync wiki, I can start putting together a doc based on the initial requirements and discussions on talk.

This sounds like a great effort. We are rather invested in FHIR so this seems like a excellent move.

It may be worth considering the FHIR transaction/batch interactions to perform the sync in a push model. Otherwise, looking at performing global searches for a pull model.

I donā€™t believe FHIR makes use of ATOM feeds anymore, rather they have moved to using a Bundle resource to list multiple resources along with metadata.

We can start documenting at Sync 2.0 - Projects - OpenMRS Wiki

Typically a JIRA project is linked with one repo so not sure, if we should reuse the JIRA project, but not the repo, unless the idea is to retire the old repo.

Hello everyone,

I have added design documentation for the module as child pages under:

https://wiki.openmrs.org/display/projects/Sync+2.0

Please let me know any feedback you have - I will be incorporating it in the documentation.

Thanks, Paweł

Thanks @pgesek I was just wondering whether there was an update on this. :slight_smile:

Comments about ā€œAtomfeed for Sync 2.0ā€:

  • If weā€™re going to refactor this, maybe we can move away from having a lot of global properties, and instead just have a single feed config that an implementation can specify with a json or yaml file
  • I like the idea of using an event-based hibernate interceptor instead of an advice class for every OpenMRS class
  • ā€œChange the default urls to point to FHIR resourcesā€ => I would hope that Sync 2.0 can also be run with Bahmni, so I hope thereā€™s a way that a single set of feeds can work for both master-slave sync, and Bahmniā€™s MRS/ERP/LIS communication. Maybe we can introduce a new format where we provide multiple possible links to the event, e.g. both the FHIR one and the REST one.

Comments about ā€œAuditing and error handlingā€:

  • I wonder if this is going to be an excessive amount of data to be storing
  • It feels more natural to log this to a file (at least for successful sync) rather than to a DB table
  • PIH Rwanda found that for admin/management purposes itā€™s really valuable to synchronize status data about each slave up to the master, so that some level of system debugging can be done centrally

ā€œCatchment Strategiesā€:

  • I guess we should get some feedback from potential implementations about what are the right strategies to focus on

ā€œFHIR Resource List for Syncā€

  • Naively I would think Drug => Medication (instead of Substance)
  • Some of these may be very imperfect matches, and after analysis we might end up wanting to do some of these via OpenMRS REST representations.
  • Thereā€™s definitely a prioritization to be done here. E.g. some implementation might start using sync with just patient. Others would use it if you add visit, encounter, and obs. Etc.

(I will try to comment on the rest of the pages tomorrow, but Iā€™ll post this now, in case I get delayed on the rest.)

Nice to see this going. few comments

  • we should not call ā€œmaster-slaveā€. IMO, it is not about replication - its about synchronizing relevant information between a server and its clients. Terminologies matter. So I would refrain from using the ā€œmaster-slaveā€ and just use ā€œserver and clientā€ everywhere. In our original paper, we had talked about how each client node can potentially be a master for its clients in the hierarchy. Essentially, each client decides what to process (push or pull) to the server.
  • I would advise to keep the FHIR resource handling as open as possible. Simply because no openmrs data storage (and blame obs hierarchical structure for it) is exactly the same. Same for medication order or statement.
  • While we may never be able to come to agreements about data structures across OpenMRS implementations and their mapping to FHIR resource universally, we should have series of discussions about the essential entities mapping mechanisms. e.g. How do we represent OpenMRS.Encounter to FHIR encounter? Does it make sense to construct this as FHIR composition? Agreement at broad level would help design the mapping and processing and also event ordering and processing.

In Bangladesh, we did all these for integrating with an HIE and using SimpleFeed as a protocol for event notification! It has served the purpose very well, but we needed to do a whole lotta work to ensure tackling problems (including ongoing support, monitoring) - many of which are relevant to out-of-order-event processing.

Why I say this repeatedly, I think that there can not be a singular subscribed mechanism for addressing such distributed system synchronization needs. All you can have is a broad framework that will probably deal with 80% of the cases, while allowing for extensibility for customization. If we fail to accommodate for the extensibility, then it will not serve general purpose. (it will serve specific purpose very well for sure)

@pgesek, thanks for pushing the project forward!

Do you have some time aside in the coming days to finalize planning and/or any estimate when development could start?

Iā€™m less available these days due to another project Iā€™m involved in, but will help as much as I can. Let me know, if there are any gaps, which I could help you fill in.

Thanks for doing all this @pgesek!

Iā€™ve tried to skim through everything, a few pieces of feedback:

  • +1 to using Hibernate Interceptors instead of AOP to figure out what to publish to the feed, as AOP is way to brittle. Maybe Iā€™m forgetting, but is there a reason we wouldnā€™t be using the existing Event module for for this? @raff @darius

  • The feed mechanism should be robust enough to handle a slave being down on the order of days, or perhaps even months

  • Darius may be right that storing all the audit information to a DB table might be excessive, but from experience debugging Sync 1.0 issues, I do feel that having good admin tools for resolving issues, etc, will be critical, so Iā€™d err on the side of whatever design you feel will allow these tools to be easiest to design, develop, and tweak.

  • I see @angshuonline point about ā€œmaster/slaveā€ not being the correct terminology (https://en.wikipedia.org/wiki/Master/slave_(technology)). Not sure if I like client/server though as it might not be clear enough. Is there a reason not to use parent/child, as was used in the initial Sync module?

Take care, Mark

Thanks for the responses, my replies:

Yes, that makes sense and should be cleaner, Iā€™ll update the feed doc.

An option is to use multiple link tags. Would be best imo to include the url configuration into this new atomfeed configuration file.

Yes, error handling is probably the biggest gripe users have with Sync 1.0, hence the the amount of logging. I agree that the success logs are probably not as important as the failure ones - we can log success messages to a file by default. Perhaps it would be best to allow users to configure which logger implementation to use for event types like success and failure. Sync could provide some default implementations like DB, File, No-op and implementers could inject their own if need be.

Iā€™ve done some basic ordering of the list, Iā€™ve also added a FHIR maturity level column.

On that page I listed the catchment strategies I got from implementers - feel free to add any additional ones.

Yes, my bad.

My hope is that on the sending/receiving end, we will have strategies that should allow controlling the mapping. We should also make it always possible for an implementation to inject their own client implementation for a given OpenMRS class (FHIR or not). It should also be possible to register additional providers on the receiving side in the FHIR module. Iā€™ll start an extension section in the FHIR doc to gather information on extending how it works.

I donā€™t have a strong opinion on the terminology, I used what was in the project description. If Sync 1.0 uses parent/child then perhaps itā€™s best to stick to it.

Thanks, Iā€™ll talk with Jakub and come back to you on this. Just note Iā€™ll be on vacation next week (18.09 - 22.09)

Regards, Paweł

Thanks @pgesek!

One other thing that occurred to meā€¦ have there been thoughts on how to handle conflicts? Apologies if I missed the details in the documentationā€¦

Currently Sync 1.0 doesnā€™t really have a strategy for resolving conflicts (beyond ā€œlast in winsā€)ā€¦ it would be great if we could at least flag conflicts (perhaps based on dateCreated and dateChanged?). Conflict resolution can certainly become complex, so Iā€™d understand keeping it simple for Phase 1, but if we could have something better than the current ā€œlast in winsā€ (without any notification) that would be great.

Take care, Mark

(More comments, which I wrote over several days, so are probably chaotic)

ā€œMetadata resource listā€ comments:

  • The word metadata is used incorrectly throughout these wiki pages, to mean ā€œstuff that canā€™t be represented in FHIRā€. A lot of our metadata canā€™t be represented in FHIR, so thereā€™s overlap, but really the distinction is things that CAN vs CANNOT be suitably represented using FHIR.
  • Order is important clinical data that really needs to be using FHIR (e.g. DrugOrder => MedicationRequest)
  • GlobalProperty needs some mechanism to decide whether to sync things at the row level (i.e. some of these are system settings that you wouldnā€™t want to sync; Iā€™m not sure if there are any GPs that you do want to sync)
  • I would sequence the work so that metadata sync happens later on. E.g. if an implementation can get all their metadata to be consistent through some other deployment process, they can adopt sync early, even without this function.

ā€œSync 2.0 Architecture Overviewā€ comments:

  • This paragraph is confusing and I donā€™t understand it:

synchronization can be done in two directions ā€¦ support both independently, as one-way sync is being used in the field currently by implementations adopting Sync 1.0. ā€¦ All synchronization will be initiated by slaves, independent from the direction that data is transmitted

  • I think itā€™s clearer to say that:
  • synchronization is always initiated by a client connecting to its server
  • once the connection is made, the client may pull data down, push data up, or both (depending on the configuration), to support multiple use cases
  • I agree with @angshuonlineā€™s comment to avoid ā€œmaster/slaveā€ terminology.
  • Since weā€™ll support multi-level sync, this should be shown in the first diagram
  • ā€œThe master will expose a complete feed of events - slaves will be in charge of filtering and pulling only the resources they are interested inā€ => consider individual feeds for different catchment areas?
  • ā€œMetadata retrieved through the REST API will have to be inserted using conventional methods - best if it directly interfaces with OpenMRS repository classes to insert the metadata into the db.ā€ If weā€™re going to write a mechanism for turning OpenMRS REST responses into OpenMRS domain objects, we should do this in a reusable way, e.g. producing a REST client library as part of the REST module.
  • " it should be also possible to make the slave proceed through the feed even if a record fails to synchronize" => yes, but we also need a default behavior where a failed sync of a metadata item doesnā€™t lead to a huge cascade of failed syncs of data items that depend on it
  • I found the big architecture diagram to be confusing. Maybe it would be clearer if you explicitly separate the pull and push workflows
  • Why is ClientFacade called that? (Maybe this is more clearly described elsewhere, but Iā€™m not sure how the Facade pattern is relevant.)

ā€œSync 2.0 configuration variablesā€ comments:

  • I would default the ā€œenabledā€ variables to false (since things wonā€™t actually work without some configuration)
  • Is the idea that changes would be captured and written to the atom feed always, regardless of whether push and/or pull are enabled? We might also want a setting like sync.enabled that controls this.

About error handling: are we going to preserve the downloaded REST/FHIR response to replay again? Or would we re-fetch it from its url?

The event module uses ActiveMQ, which has been occasionally buggy for us. (Maybe weā€™re just using a 5 year old version of it.) We should definitely consider either modernizing the event module, or writing a from-scratch replacement, that can be used for Sync 2.0 and other things too. But for Sync 2.0 the real requirements is to have something that goes from hibernate interceptor events to atom feed events. We could go via a message broker like in the event module, which could be more reusable, but also adds (unnecessary?) complexity.

Good point. So, yes, itā€™s fair to prioritize logging. I wouldnā€™t over-complicate by giving too much configuration about which loggers to use. Just pick a good default, at least in the first pass.

I was also thinking about this. Personally I would prioritize ā€œquicker to developā€ above ā€œconflict resolutionā€. (Ultimately, it would be nice to capture a complete change history in openmrs-core, or in a multi-purpose module, not just for sync.)

Even to do something simple like comparing based on dateCreated/dateChanged requires that we persist an extra piece of data on each update (i.e. what was its original created/changed timestamp) so we can do a 3-way comparison when merging. If this can be done while generating the atom feed, without adding lots of complexity, it could be worthwhile.

I wonder if we can leave a hook to add conflict resolution in at a later date, once a separate module has been built that captures a full change log.

Agree that ā€œconflict resolutionā€ could be a bit of a beast and we shouldnā€™t get too bogged down with doing it perfectly. However I think we certainly should consider have at least some sort of conflict notification in Phase 1 or at least some sort of explicit warning to prevent people from doing ā€œdangerousā€ things.

Sync 1.0 has no conflict resolution or warnings, and the general rule of thumb with Sync 1.0 is that ā€œyou never should be editing the same patient on both the parent and the childā€ (or two children, for that matter)ā€¦ but I find myself forgetting that rule a lot, no mind an everyday end user. Itā€™s kind of scary because it can silently lead to data corruption (and I have no way of saying whether that has actually happened in our current implementations).

Take care, Mark

A few quick thoughts:

If we canā€™t use the event module for thisā€¦what exactly is the purpose of the event module? If we think the implementation of the event module is buggy, then we should fix the implementation, right? It would be nice for us to deal with hibernate interceptor messiness once, and do it properly, in the event module, and then for us to rely on that for a way to publish / subscribe to changes.

Presumably, there could be an Event listener that we support which would simply hit the REST endpoint, get the JSON, and log this - maybe to an external document database, along with the other information from the event (event type, data type, uuid, etc). Then, implementations that want a full audit log and have the terabytes to spare could simply enable this. This would seem like it would be pretty low-effort to add in.

Mike