I don’t see an issue with doing a new git repo - it has some advantages over branching.
Did we decide on using the existing Jira and Wiki? I noticed that there are about 50 unresolved issues for sync - doesn’t necessarily mean we need a new project, we can just use the version fields to keep track of that.
If we are using the existing sync wiki, I can start putting together a doc based on the initial requirements and discussions on talk.
Thanks @pgesek I was just wondering whether there was an update on this.
Comments about “Atomfeed for Sync 2.0”:
If we’re going to refactor this, maybe we can move away from having a lot of global properties, and instead just have a single feed config that an implementation can specify with a json or yaml file
I like the idea of using an event-based hibernate interceptor instead of an advice class for every OpenMRS class
“Change the default urls to point to FHIR resources” => I would hope that Sync 2.0 can also be run with Bahmni, so I hope there’s a way that a single set of feeds can work for both master-slave sync, and Bahmni’s MRS/ERP/LIS communication. Maybe we can introduce a new format where we provide multiple possible links to the event, e.g. both the FHIR one and the REST one.
Comments about “Auditing and error handling”:
I wonder if this is going to be an excessive amount of data to be storing
It feels more natural to log this to a file (at least for successful sync) rather than to a DB table
PIH Rwanda found that for admin/management purposes it’s really valuable to synchronize status data about each slave up to the master, so that some level of system debugging can be done centrally
I guess we should get some feedback from potential implementations about what are the right strategies to focus on
“FHIR Resource List for Sync”
Naively I would think Drug => Medication (instead of Substance)
Some of these may be very imperfect matches, and after analysis we might end up wanting to do some of these via OpenMRS REST representations.
There’s definitely a prioritization to be done here. E.g. some implementation might start using sync with just patient. Others would use it if you add visit, encounter, and obs. Etc.
(I will try to comment on the rest of the pages tomorrow, but I’ll post this now, in case I get delayed on the rest.)
we should not call “master-slave”. IMO, it is not about replication - its about synchronizing relevant information between a server and its clients. Terminologies matter. So I would refrain from using the “master-slave” and just use “server and client” everywhere. In our original paper, we had talked about how each client node can potentially be a master for its clients in the hierarchy. Essentially, each client decides what to process (push or pull) to the server.
I would advise to keep the FHIR resource handling as open as possible. Simply because no openmrs data storage (and blame obs hierarchical structure for it) is exactly the same. Same for medication order or statement.
While we may never be able to come to agreements about data structures across OpenMRS implementations and their mapping to FHIR resource universally, we should have series of discussions about the essential entities mapping mechanisms. e.g. How do we represent OpenMRS.Encounter to FHIR encounter? Does it make sense to construct this as FHIR composition? Agreement at broad level would help design the mapping and processing and also event ordering and processing.
In Bangladesh, we did all these for integrating with an HIE and using SimpleFeed as a protocol for event notification! It has served the purpose very well, but we needed to do a whole lotta work to ensure tackling problems (including ongoing support, monitoring) - many of which are relevant to out-of-order-event processing.
Why I say this repeatedly, I think that there can not be a singular subscribed mechanism for addressing such distributed system synchronization needs. All you can have is a broad framework that will probably deal with 80% of the cases, while allowing for extensibility for customization. If we fail to accommodate for the extensibility, then it will not serve general purpose. (it will serve specific purpose very well for sure)
I’ve tried to skim through everything, a few pieces of feedback:
+1 to using Hibernate Interceptors instead of AOP to figure out what to publish to the feed, as AOP is way to brittle. Maybe I’m forgetting, but is there a reason we wouldn’t be using the existing Event module for for this? @raff@darius
The feed mechanism should be robust enough to handle a slave being down on the order of days, or perhaps even months
Darius may be right that storing all the audit information to a DB table might be excessive, but from experience debugging Sync 1.0 issues, I do feel that having good admin tools for resolving issues, etc, will be critical, so I’d err on the side of whatever design you feel will allow these tools to be easiest to design, develop, and tweak.
Yes, that makes sense and should be cleaner, I’ll update the feed doc.
An option is to use multiple link tags. Would be best imo to include the url configuration into this new atomfeed configuration file.
Yes, error handling is probably the biggest gripe users have with Sync 1.0, hence the the amount of logging. I agree that the success logs are probably not as important as the failure ones - we can log success messages to a file by default. Perhaps it would be best to allow users to configure which logger implementation to use for event types like success and failure. Sync could provide some default implementations like DB, File, No-op and implementers could inject their own if need be.
I’ve done some basic ordering of the list, I’ve also added a FHIR maturity level column.
On that page I listed the catchment strategies I got from implementers - feel free to add any additional ones.
Yes, my bad.
My hope is that on the sending/receiving end, we will have strategies that should allow controlling the mapping. We should also make it always possible for an implementation to inject their own client implementation for a given OpenMRS class (FHIR or not). It should also be possible to register additional providers on the receiving side in the FHIR module. I’ll start an extension section in the FHIR doc to gather information on extending how it works.
I don’t have a strong opinion on the terminology, I used what was in the project description. If Sync 1.0 uses parent/child then perhaps it’s best to stick to it.
Thanks, I’ll talk with Jakub and come back to you on this. Just note I’ll be on vacation next week (18.09 - 22.09)
One other thing that occurred to me… have there been thoughts on how to handle conflicts? Apologies if I missed the details in the documentation…
Currently Sync 1.0 doesn’t really have a strategy for resolving conflicts (beyond “last in wins”)… it would be great if we could at least flag conflicts (perhaps based on dateCreated and dateChanged?). Conflict resolution can certainly become complex, so I’d understand keeping it simple for Phase 1, but if we could have something better than the current “last in wins” (without any notification) that would be great.
(More comments, which I wrote over several days, so are probably chaotic)
“Metadata resource list” comments:
The word metadata is used incorrectly throughout these wiki pages, to mean “stuff that can’t be represented in FHIR”. A lot of our metadata can’t be represented in FHIR, so there’s overlap, but really the distinction is things that CAN vs CANNOT be suitably represented using FHIR.
Order is important clinical data that really needs to be using FHIR (e.g. DrugOrder => MedicationRequest)
GlobalProperty needs some mechanism to decide whether to sync things at the row level (i.e. some of these are system settings that you wouldn’t want to sync; I’m not sure if there are any GPs that you do want to sync)
I would sequence the work so that metadata sync happens later on. E.g. if an implementation can get all their metadata to be consistent through some other deployment process, they can adopt sync early, even without this function.
“Sync 2.0 Architecture Overview” comments:
This paragraph is confusing and I don’t understand it:
synchronization can be done in two directions … support both independently, as one-way sync is being used in the field currently by implementations adopting Sync 1.0. … All synchronization will be initiated by slaves, independent from the direction that data is transmitted
I think it’s clearer to say that:
synchronization is always initiated by a client connecting to its server
once the connection is made, the client may pull data down, push data up, or both (depending on the configuration), to support multiple use cases
I agree with @angshuonline’s comment to avoid “master/slave” terminology.
Since we’ll support multi-level sync, this should be shown in the first diagram
“The master will expose a complete feed of events - slaves will be in charge of filtering and pulling only the resources they are interested in” => consider individual feeds for different catchment areas?
“Metadata retrieved through the REST API will have to be inserted using conventional methods - best if it directly interfaces with OpenMRS repository classes to insert the metadata into the db.” If we’re going to write a mechanism for turning OpenMRS REST responses into OpenMRS domain objects, we should do this in a reusable way, e.g. producing a REST client library as part of the REST module.
" it should be also possible to make the slave proceed through the feed even if a record fails to synchronize" => yes, but we also need a default behavior where a failed sync of a metadata item doesn’t lead to a huge cascade of failed syncs of data items that depend on it
I found the big architecture diagram to be confusing. Maybe it would be clearer if you explicitly separate the pull and push workflows
Why is ClientFacade called that? (Maybe this is more clearly described elsewhere, but I’m not sure how the Facade pattern is relevant.)
“Sync 2.0 configuration variables” comments:
I would default the “enabled” variables to false (since things won’t actually work without some configuration)
Is the idea that changes would be captured and written to the atom feed always, regardless of whether push and/or pull are enabled? We might also want a setting like sync.enabled that controls this.
About error handling: are we going to preserve the downloaded REST/FHIR response to replay again? Or would we re-fetch it from its url?
The event module uses ActiveMQ, which has been occasionally buggy for us. (Maybe we’re just using a 5 year old version of it.) We should definitely consider either modernizing the event module, or writing a from-scratch replacement, that can be used for Sync 2.0 and other things too. But for Sync 2.0 the real requirements is to have something that goes from hibernate interceptor events to atom feed events. We could go via a message broker like in the event module, which could be more reusable, but also adds (unnecessary?) complexity.
Good point. So, yes, it’s fair to prioritize logging. I wouldn’t over-complicate by giving too much configuration about which loggers to use. Just pick a good default, at least in the first pass.
I was also thinking about this. Personally I would prioritize “quicker to develop” above “conflict resolution”. (Ultimately, it would be nice to capture a complete change history in openmrs-core, or in a multi-purpose module, not just for sync.)
Even to do something simple like comparing based on dateCreated/dateChanged requires that we persist an extra piece of data on each update (i.e. what was its original created/changed timestamp) so we can do a 3-way comparison when merging. If this can be done while generating the atom feed, without adding lots of complexity, it could be worthwhile.
I wonder if we can leave a hook to add conflict resolution in at a later date, once a separate module has been built that captures a full change log.
Agree that “conflict resolution” could be a bit of a beast and we shouldn’t get too bogged down with doing it perfectly. However I think we certainly should consider have at least some sort of conflict notification in Phase 1 or at least some sort of explicit warning to prevent people from doing “dangerous” things.
Sync 1.0 has no conflict resolution or warnings, and the general rule of thumb with Sync 1.0 is that “you never should be editing the same patient on both the parent and the child” (or two children, for that matter)… but I find myself forgetting that rule a lot, no mind an everyday end user. It’s kind of scary because it can silently lead to data corruption (and I have no way of saying whether that has actually happened in our current implementations).
If we can’t use the event module for this…what exactly is the purpose of the event module? If we think the implementation of the event module is buggy, then we should fix the implementation, right? It would be nice for us to deal with hibernate interceptor messiness once, and do it properly, in the event module, and then for us to rely on that for a way to publish / subscribe to changes.
Presumably, there could be an Event listener that we support which would simply hit the REST endpoint, get the JSON, and log this - maybe to an external document database, along with the other information from the event (event type, data type, uuid, etc). Then, implementations that want a full audit log and have the terabytes to spare could simply enable this. This would seem like it would be pretty low-effort to add in.
I agree that this could be an opportunity to either fix the event module, or rewrite it such that we have something more useful.
The event module uses a hibernate interceptor to capture all changes to OpenmrsObjects. This is useful for Sync, and in general. We should leverage this, since it has been decently debugged, and even has some end-to-end test coverage.
The event module also delivers these events asynchronously (and outside of the hibernate/db transaction) using ActiveMQ. Sync’s atom feeds are already going to support asynchronous reading, and having the write happen outside of the hibernate transaction might be a problem if you end up losing events in some failure mode.
I’m concerned that working asynchronously via ActiveMQ will add an extra moving part and possibility of failure, that we don’t need. We should research this when deciding where to invest effort while working on Sync.
(Maybe this is an opportunity to remove the ActiveMQ dependency entirely, and make the event module simpler, just supporting a simple synchronous Java callback, and then move ActiveMQ into an extension module, if it’s even needed anywhere. @mseaton, @mogoodrich do you know if you’ve used Event anywhere outside of sending radiology orders in Mirebalais/PIH-EMR? )
@darius, we don’t use it beyond that as far as I know. I would definitely support moving the use of ActiveMQ to be optional in some way (or removing it altogether), and focus on supporting a synchronous mechanism to start. Even if we just support the modest goal of eliminating all custom of AOP and interceptors whose sole goal is to detect changes to one or more OpenMRS objects (which are all synchronous right now), and migrate these to use a common event module instead, this would be a reasonably big win.
The event module sends out the event asynchronously and I think it should stay that way, I personally don’t support the idea of completely changing this behavior just because of sync, how about if we make the behavior configurable?
@wyclif my understanding from the previous discussants is, since the current usage requirement is synchronous, they suggest making it the default, and then the asynchronous becomes the non default option that one has to turn on and configure.
I’m only aware of the registration core module, it fires events whenever a new patient is registered but it uses to fire events rather than listening for them. I think the atom feed modules also uses it to to listen for events.