Enhancement for Merge Patient Data module

Hi all,

Extending discussions from Merge Patient data from Multiple Installations Project, we came up with ideas for improving the current implementation of the project.

  • Optimizing memory management.

Looking at this code I see an “MPDStore” class that includes List<Patient> List<Encounter> and List<Obs> https://github.com/samuelmale/openmrs-module-mergepatientdata/blob/master/api/src/main/java/org/openmrs/module/mergepatientdata/sync/MPDStore.java This implies that you’re putting a huge amount of data into a single object, and then serializing it. This will not scale well, and will eventually break down, because it requires you to load all the patients, encounters, and obs, into memory at one time. Instead you need to work with smaller batches of data at a time. This can either mean using a “streaming” API for the serializer, or it can mean using an object serializer, but only passing it one chunk of data at a time. You have to do this for both export and import, because things will break if you try to load the entire database into memory. So, the file format could actually be a zip file that has “part1.json”, “part2.json”, etc. You could do this by patient, e.g. each part has 10 patients + all their encounter + obs, etc. Or you could do this by object, e.g. 100 patients, then 100 more patients, then 100 more patients, then 100 encounters, then 100 encounters, etc. But when you load things in you need to do it in the right order, e.g. Patients have to come before their Encounters, which come before their Obs. (Or if you were includeing including patient relationships, those need to come after both patients.)

The above ideas were raised on today’s design call. Details on that can be found here

  • About file upload size

We suggest that you architect this around the idea that the module reads in data from a folder (the location can be hardcoded in the application data directory, or it could be configurable). The user can then do either of two things: 1. Manually put a file in that folder (requires them to have filesystem access to the server) 2. Upload through web UI (file size will be limited though)

The above ideas were raised on today’s design call. Details on that can be found here

DJ: I assume the use case is that the two servers are remote from each other, e.g. you are moving data from a district to the central facility. You should ask Stephen Musoke (who was/is the project champion) to confirm that this new approach will work for his use case.

Does this new workflow make sense towards meeting a better implementation @ssmusoke?

Notes : https://notes.openmrs.org/2018-09-26-Design-Forum