Sync 2.0 Planning

pgesek · June 13, 2017, 5:41pm

Thanks @ayeung and @aramirez - we are looking forward to working with FGH Mozambique & Global Brigades during the development of Sync 2.0. Your testing and input on the design/development of Sync 2.0 will definitely help us build a solution that will meet the needs of the community.

@raff and @jthomas - I agree that the next step will be documenting on the wiki (in the Sync space) and organizing a design call to discuss further.

darius · June 13, 2017, 5:59pm

I would also consider the option that although this will be a replacement for the sync module, we might choose to give it a new name. And since the codebase will be 100% new there is no clear value to doing this in the old sync module’s git repo.

-Darius (by phone)

mogoodrich · June 13, 2017, 6:35pm

(thought I posted this in response to Mike, but just seeing now that I didn’t… )

Since it is going to be a completely new code base, doesn’t it make sense to start a new repo? I guess from a naming convention it does make it easier. Is there any kind of best practice around this type of situation?

That being said, if the general consensus is to use the same repo, I’m fine with that…

Mark

wyclif · June 13, 2017, 9:36pm

I’d also vote for having the new version in a new repo since the codebase is really completely different, what would be the module id though?

darius · June 13, 2017, 10:35pm

Angular 1 and 2 are in two different repos:

Jackson JSON 1 is in svn and 2 is in github:

OpenLMIS created a new set of repos when they did a complete refactor/rewrite (I advised on this):

GitHub - OpenLMIS/open-lmis: -PREVIOUS VERSION- OpenLMIS (Open Logistics Management Information System) version 2 core software for a shared, open source solution for managing medical commodity distribution in low- and middle-income countries. Technical forum: https://groups.google.com/forum/#!forum/openlmis-dev.
GitHub - OpenLMIS/openlmis-ref-distro: OpenLMIS v3+ Reference Distribution. Last mile health commodity information system.

So, it seems like large-scale code rewrites prefer to create new repos. (E.g. you would never want a developer to do a git pull and inadvertently go from sync 1.x to sync 2.x.)

I recommend that we delay on deciding the module id and the exact naming/marketing as we do design, figure out what different independent components are involved, what might overlap with a connect-to-MPI/HIE project, etc. (We can continue to call the project Sync 2.0 in the interim.)

werick · June 15, 2017, 4:06am

KEMRI-SEARCH and KEMRI-FACES are extensive users of sync 1.x version and I would be happy to be part of the testing team.

Erick

craigappl · June 16, 2017, 8:17pm

FYI @carl I want to make sure you’re informed about this conversation because you’re doing so much work with HIE and FHIR.

pgesek · June 28, 2017, 4:32pm

Thanks @werick, we will keep you in the loop.

I don’t see an issue with doing a new git repo - it has some advantages over branching.

Did we decide on using the existing Jira and Wiki? I noticed that there are about 50 unresolved issues for sync - doesn’t necessarily mean we need a new project, we can just use the version fields to keep track of that.

If we are using the existing sync wiki, I can start putting together a doc based on the initial requirements and discussions on talk.

rcrichton · June 29, 2017, 8:06am

This sounds like a great effort. We are rather invested in FHIR so this seems like a excellent move.

It may be worth considering the FHIR transaction/batch interactions to perform the sync in a push model. Otherwise, looking at performing global searches for a pull model.

I don’t believe FHIR makes use of ATOM feeds anymore, rather they have moved to using a Bundle resource to list multiple resources along with metadata.

raff · June 29, 2017, 9:23am

We can start documenting at Sync 2.0 - Projects - OpenMRS Wiki

Typically a JIRA project is linked with one repo so not sure, if we should reuse the JIRA project, but not the repo, unless the idea is to retire the old repo.

pgesek · September 7, 2017, 4:57pm

Hello everyone,

I have added design documentation for the module as child pages under:

https://wiki.openmrs.org/display/projects/Sync+2.0

Please let me know any feedback you have - I will be incorporating it in the documentation.

Thanks, Paweł

darius · September 8, 2017, 2:08am

Thanks @pgesek I was just wondering whether there was an update on this.

Comments about “Atomfeed for Sync 2.0”:

If we’re going to refactor this, maybe we can move away from having a lot of global properties, and instead just have a single feed config that an implementation can specify with a json or yaml file
I like the idea of using an event-based hibernate interceptor instead of an advice class for every OpenMRS class
“Change the default urls to point to FHIR resources” => I would hope that Sync 2.0 can also be run with Bahmni, so I hope there’s a way that a single set of feeds can work for both master-slave sync, and Bahmni’s MRS/ERP/LIS communication. Maybe we can introduce a new format where we provide multiple possible links to the event, e.g. both the FHIR one and the REST one.

Comments about “Auditing and error handling”:

I wonder if this is going to be an excessive amount of data to be storing
It feels more natural to log this to a file (at least for successful sync) rather than to a DB table
PIH Rwanda found that for admin/management purposes it’s really valuable to synchronize status data about each slave up to the master, so that some level of system debugging can be done centrally

“Catchment Strategies”:

I guess we should get some feedback from potential implementations about what are the right strategies to focus on

“FHIR Resource List for Sync”

Naively I would think Drug => Medication (instead of Substance)
Some of these may be very imperfect matches, and after analysis we might end up wanting to do some of these via OpenMRS REST representations.
There’s definitely a prioritization to be done here. E.g. some implementation might start using sync with just patient. Others would use it if you add visit, encounter, and obs. Etc.

(I will try to comment on the rest of the pages tomorrow, but I’ll post this now, in case I get delayed on the rest.)

angshuonline · September 8, 2017, 5:03am

Nice to see this going. few comments

we should not call “master-slave”. IMO, it is not about replication - its about synchronizing relevant information between a server and its clients. Terminologies matter. So I would refrain from using the “master-slave” and just use “server and client” everywhere. In our original paper, we had talked about how each client node can potentially be a master for its clients in the hierarchy. Essentially, each client decides what to process (push or pull) to the server.
I would advise to keep the FHIR resource handling as open as possible. Simply because no openmrs data storage (and blame obs hierarchical structure for it) is exactly the same. Same for medication order or statement.

While we may never be able to come to agreements about data structures across OpenMRS implementations and their mapping to FHIR resource universally, we should have series of discussions about the essential entities mapping mechanisms. e.g. How do we represent OpenMRS.Encounter to FHIR encounter? Does it make sense to construct this as FHIR composition? Agreement at broad level would help design the mapping and processing and also event ordering and processing.

In Bangladesh, we did all these for integrating with an HIE and using SimpleFeed as a protocol for event notification! It has served the purpose very well, but we needed to do a whole lotta work to ensure tackling problems (including ongoing support, monitoring) - many of which are relevant to out-of-order-event processing.

Why I say this repeatedly, I think that there can not be a singular subscribed mechanism for addressing such distributed system synchronization needs. All you can have is a broad framework that will probably deal with 80% of the cases, while allowing for extensibility for customization. If we fail to accommodate for the extensibility, then it will not serve general purpose. (it will serve specific purpose very well for sure)

raff · September 12, 2017, 11:50am

@pgesek, thanks for pushing the project forward!

Do you have some time aside in the coming days to finalize planning and/or any estimate when development could start?

I’m less available these days due to another project I’m involved in, but will help as much as I can. Let me know, if there are any gaps, which I could help you fill in.

mogoodrich · September 13, 2017, 4:06pm

Thanks for doing all this @pgesek!

I’ve tried to skim through everything, a few pieces of feedback:

+1 to using Hibernate Interceptors instead of AOP to figure out what to publish to the feed, as AOP is way to brittle. Maybe I’m forgetting, but is there a reason we wouldn’t be using the existing Event module for for this? @raff @darius
The feed mechanism should be robust enough to handle a slave being down on the order of days, or perhaps even months
Darius may be right that storing all the audit information to a DB table might be excessive, but from experience debugging Sync 1.0 issues, I do feel that having good admin tools for resolving issues, etc, will be critical, so I’d err on the side of whatever design you feel will allow these tools to be easiest to design, develop, and tweak.
I see @angshuonline point about “master/slave” not being the correct terminology (https://en.wikipedia.org/wiki/Master/slave_(technology)). Not sure if I like client/server though as it might not be clear enough. Is there a reason not to use parent/child, as was used in the initial Sync module?

Take care, Mark

pgesek · September 14, 2017, 7:32am

Thanks for the responses, my replies:

Yes, that makes sense and should be cleaner, I’ll update the feed doc.

An option is to use multiple link tags. Would be best imo to include the url configuration into this new atomfeed configuration file.

Yes, error handling is probably the biggest gripe users have with Sync 1.0, hence the the amount of logging. I agree that the success logs are probably not as important as the failure ones - we can log success messages to a file by default. Perhaps it would be best to allow users to configure which logger implementation to use for event types like success and failure. Sync could provide some default implementations like DB, File, No-op and implementers could inject their own if need be.

I’ve done some basic ordering of the list, I’ve also added a FHIR maturity level column.

On that page I listed the catchment strategies I got from implementers - feel free to add any additional ones.

Yes, my bad.

My hope is that on the sending/receiving end, we will have strategies that should allow controlling the mapping. We should also make it always possible for an implementation to inject their own client implementation for a given OpenMRS class (FHIR or not). It should also be possible to register additional providers on the receiving side in the FHIR module. I’ll start an extension section in the FHIR doc to gather information on extending how it works.

I don’t have a strong opinion on the terminology, I used what was in the project description. If Sync 1.0 uses parent/child then perhaps it’s best to stick to it.

Thanks, I’ll talk with Jakub and come back to you on this. Just note I’ll be on vacation next week (18.09 - 22.09)

Regards, Paweł

mogoodrich · September 14, 2017, 2:57pm

Thanks @pgesek!

One other thing that occurred to me… have there been thoughts on how to handle conflicts? Apologies if I missed the details in the documentation…

Currently Sync 1.0 doesn’t really have a strategy for resolving conflicts (beyond “last in wins”)… it would be great if we could at least flag conflicts (perhaps based on dateCreated and dateChanged?). Conflict resolution can certainly become complex, so I’d understand keeping it simple for Phase 1, but if we could have something better than the current “last in wins” (without any notification) that would be great.

Take care, Mark

darius · September 14, 2017, 8:59pm

(More comments, which I wrote over several days, so are probably chaotic)

“Metadata resource list” comments:

The word metadata is used incorrectly throughout these wiki pages, to mean “stuff that can’t be represented in FHIR”. A lot of our metadata can’t be represented in FHIR, so there’s overlap, but really the distinction is things that CAN vs CANNOT be suitably represented using FHIR.
Order is important clinical data that really needs to be using FHIR (e.g. DrugOrder => MedicationRequest)
GlobalProperty needs some mechanism to decide whether to sync things at the row level (i.e. some of these are system settings that you wouldn’t want to sync; I’m not sure if there are any GPs that you do want to sync)
I would sequence the work so that metadata sync happens later on. E.g. if an implementation can get all their metadata to be consistent through some other deployment process, they can adopt sync early, even without this function.

“Sync 2.0 Architecture Overview” comments:

This paragraph is confusing and I don’t understand it:

synchronization can be done in two directions … support both independently, as one-way sync is being used in the field currently by implementations adopting Sync 1.0. … All synchronization will be initiated by slaves, independent from the direction that data is transmitted

I think it’s clearer to say that:

synchronization is always initiated by a client connecting to its server
once the connection is made, the client may pull data down, push data up, or both (depending on the configuration), to support multiple use cases

I agree with @angshuonline’s comment to avoid “master/slave” terminology.
Since we’ll support multi-level sync, this should be shown in the first diagram
“The master will expose a complete feed of events - slaves will be in charge of filtering and pulling only the resources they are interested in” => consider individual feeds for different catchment areas?
“Metadata retrieved through the REST API will have to be inserted using conventional methods - best if it directly interfaces with OpenMRS repository classes to insert the metadata into the db.” If we’re going to write a mechanism for turning OpenMRS REST responses into OpenMRS domain objects, we should do this in a reusable way, e.g. producing a REST client library as part of the REST module.
" it should be also possible to make the slave proceed through the feed even if a record fails to synchronize" => yes, but we also need a default behavior where a failed sync of a metadata item doesn’t lead to a huge cascade of failed syncs of data items that depend on it
I found the big architecture diagram to be confusing. Maybe it would be clearer if you explicitly separate the pull and push workflows
Why is ClientFacade called that? (Maybe this is more clearly described elsewhere, but I’m not sure how the Facade pattern is relevant.)

“Sync 2.0 configuration variables” comments:

I would default the “enabled” variables to false (since things won’t actually work without some configuration)
Is the idea that changes would be captured and written to the atom feed always, regardless of whether push and/or pull are enabled? We might also want a setting like sync.enabled that controls this.

About error handling: are we going to preserve the downloaded REST/FHIR response to replay again? Or would we re-fetch it from its url?

The event module uses ActiveMQ, which has been occasionally buggy for us. (Maybe we’re just using a 5 year old version of it.) We should definitely consider either modernizing the event module, or writing a from-scratch replacement, that can be used for Sync 2.0 and other things too. But for Sync 2.0 the real requirements is to have something that goes from hibernate interceptor events to atom feed events. We could go via a message broker like in the event module, which could be more reusable, but also adds (unnecessary?) complexity.

Good point. So, yes, it’s fair to prioritize logging. I wouldn’t over-complicate by giving too much configuration about which loggers to use. Just pick a good default, at least in the first pass.

I was also thinking about this. Personally I would prioritize “quicker to develop” above “conflict resolution”. (Ultimately, it would be nice to capture a complete change history in openmrs-core, or in a multi-purpose module, not just for sync.)

Even to do something simple like comparing based on dateCreated/dateChanged requires that we persist an extra piece of data on each update (i.e. what was its original created/changed timestamp) so we can do a 3-way comparison when merging. If this can be done while generating the atom feed, without adding lots of complexity, it could be worthwhile.

I wonder if we can leave a hook to add conflict resolution in at a later date, once a separate module has been built that captures a full change log.

mogoodrich · September 14, 2017, 9:30pm

Agree that “conflict resolution” could be a bit of a beast and we shouldn’t get too bogged down with doing it perfectly. However I think we certainly should consider have at least some sort of conflict notification in Phase 1 or at least some sort of explicit warning to prevent people from doing “dangerous” things.

Sync 1.0 has no conflict resolution or warnings, and the general rule of thumb with Sync 1.0 is that “you never should be editing the same patient on both the parent and the child” (or two children, for that matter)… but I find myself forgetting that rule a lot, no mind an everyday end user. It’s kind of scary because it can silently lead to data corruption (and I have no way of saying whether that has actually happened in our current implementations).

Take care, Mark

mseaton · September 14, 2017, 9:40pm

A few quick thoughts:

If we can’t use the event module for this…what exactly is the purpose of the event module? If we think the implementation of the event module is buggy, then we should fix the implementation, right? It would be nice for us to deal with hibernate interceptor messiness once, and do it properly, in the event module, and then for us to rely on that for a way to publish / subscribe to changes.

Presumably, there could be an Event listener that we support which would simply hit the REST endpoint, get the JSON, and log this - maybe to an external document database, along with the other information from the event (event type, data type, uuid, etc). Then, implementations that want a full audit log and have the terabytes to spare could simply enable this. This would seem like it would be pretty low-effort to add in.

Mike