Preservation of Complex Obs data

long27km · October 4, 2019, 8:25pm

I believe the consensus in healthcare is that medical data should always be preserved with records of corrections (except via some government established procedure if there is ever a valid requirement to purge data). Generally, I believe the policy is to preserve data, not to “clean up hard drive space”.

Currently, some types of observations like attached images (technically considered “Complex Obs”) can be deleted from the file system when the database Observation is voided. I have done some research on JIRA and see this specific issue has gone through some changes over the years. There appears to have been some fluctuation within the community between one approach (saving the files) and the other (deleting them). I’d like to hear what the community needs and policy are, to meet the needs of users and patients, and not just what an individual dev or group believes is “right” from their point of view. I’d appreciate any feedback, existing policies or case studies to direct future work in this area. Being that I’m primarily acquainted with the dev community, I’d like to invite comments from @AdvisoryCouncil , @janflowers , @dkayiwa , @wyclif , @mksd , and @mogoodrich .

I have described the current issue I am seeing here on a previous bug case that appears to have the same symptom for anyone technically inclined to read more: https://issues.openmrs.org/browse/LUI-135?focusedCommentId=259299&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259299

Thanks all, Ken Long @long27km

janflowers · October 4, 2019, 9:51pm

@lober @akanter @paul @burke Thought you all might be interested in thinking through this question.

lober · October 4, 2019, 11:10pm

I think the underlying issue is regulatory requirements, which are complex, rather than preferences. Always good to look at regulations and decide if they have value, rather than simply rejecting them because they are from a different jurisdiction and compliance is not mandated.

Here’s a fairly random link which conveys the complexity of records retention in the US - different periods based on record type, federal guidelines, state law, and age of patient, to name a few.

While the EMR is not the only element of a medical record, increasingly it contains information which is not found elsewhere (ie., only an electronic copy exists). So, depending on the setting, it should not really be an issue of a preference to preserve changes or clean up the database - it should be an issue of regulatory compliance.

Beyond regulation, I think it is also sound practice to be able to reconstruct the record, including deleted items, whether through an ever-growing database of data which includes data “marked as deleted”, or whether through an ever-growing and archived audit log.

paul · October 6, 2019, 11:36pm

Thanks for the thinking around this, Ken.

Generally, our early designs focused upon never “deleting” data per se… instead, voiding data simply removes it from the user experience, but doesn’t physically delete it from the database.

My instinct is that if complex obs are deleted with a void, then that is an oversight that should be remedied.

@burke: thoughts?

dkayiwa · October 7, 2019, 8:44am

I also believe that we should preserve this data. And it looks like the direction code comments like this are driving towards: https://github.com/openmrs/openmrs-core/blob/master/api/src/main/java/org/openmrs/api/ObsService.java#L169-L171

janflowers · October 10, 2019, 9:43pm

I see that there’s some discussion on the JIRA ticket that we should create a global property that allows for configuration to either keep or delete voided complex obs files. I think that OpenMRS specifically here should take the stance with our products of “do no harm”, and not necessarily allow the deletion of data with our products used directly from our codebase. They certainly could modify their code to do so, but would like to make it more of a hurdle for someone to do that than just a configuration difference. @burke @paul @dkayiwa how do you feel about that?

dkayiwa · October 11, 2019, 12:30pm

I have no objection to that.

mksd · March 5, 2020, 8:22am

Fetching an obs is a reading process, and thereby should not make the obs dirty. Likewise fetching a complex obs should not make that obs dirty.

However fetching a complex obs triggers the usage of a setter (Obs#setComplexData(ComplexData)), see for example here (all complex data handlers end up doing that) that dirties the obs.

I believe that this is a design flaw and there might be unwanted side effects. @long27km, do you remember the one that you had stumbled upon a while back? What was the exact chain of events?

long27km · March 16, 2020, 10:33pm

@mksd I first noticed this behavior when working on converting the Drawing module to use SVG. Even though there is code for immediately loading ComplexData (https://issues.openmrs.org/browse/TRUNK-5198) in the ObsService (which in my opinion should be lazy loaded in the Obs class getObs() itself), when using HFE FormSessionContext.getExistingObs(), I was seeing the data member was still null and started calling the ObsService#getObs(Integer) (How this happens via Spring/Hibernate ORM, in relation to Encounter#setObs() obs loading, might also be worth documenting in the Encounter class? @wyclif Maybe you’d be able to explain it clearly given you’re work with Annotations there).

I later started setting the previous observation on updated observations, and I began to see the file was being deleted and corresponding FileNotFoundExceptions when viewing encounters and attachments. This is due to both having a previous complex observation and loading the data via Handler (which currently, must call setComplexData() to set the ComplexData member). This causes the code path at ObsServiceImpl#saveObs(Obs, String) to take the branch that calls ObsServiceImpl#saveExistingObs(Obs, String), which in turn calls ObsServiceImpl#voidExistingObs(Obs, String, Obs) where this code actually deletes the file.

if (newObs.hasPreviousVersion() && newObs.getPreviousVersion().isComplex()) {
	File previousFile = AbstractHandler.getComplexDataFile(obs);
	previousFile.delete();
}

In the HFE code path for Edit encounter that calls the Drawing SubmissionElement, even when no change has been made and no new file needs to be written, there is code that leads to this code path and deleting the file! (It actually REQUIRED creating a new obs/file to workaround the deletion causing an unnecessary disk write!) This is actually a result of logic in core and not really a HFE specific issue. Loading the data to show the user, marks the obs dirty, which tells core that the file has been updated and needs to be rewritten, even though it was only viewed.

This code was introduced in 2.0 for https://issues.openmrs.org/browse/TRUNK-4538, and seemed to be the cause of various cases like https://issues.openmrs.org/browse/LUI-135 on JIRA. I also found HFE had a TODO to set previous observations, and implemented that feature (https://issues.openmrs.org/browse/HTML-675). It may just be that this was an edge case before, since previous obs wasn’t necessarily being used that often (though it should be), or that it was even actively avoided if anyone encountered this issue before. This behavior seems like it needs attention in the short term before it becomes a bigger problem in an implementation somewhere.

As I mentioned, lazy loading the data seems like one way to go (but would also require clearing the dirty flag from the call to setComplexData() (!), which does seem to indicate a design problem) or alternatively, adding an Obs ctor that takes a ComplexData argument. One other option @mksd mentioned to me in another discussion that’s worth considering is removing the markAsDirty() from the Obs class setComplexData() and providing a method for the Handler to determine and set the dirty flag itself, since interpreting (or at least knowing how to save/load) the ComplexData is closer to being the Handler’s responsibility than being the Obs class’.

The biggest consideration about any approach is providing a way for the data to be loaded by a Handler without allowing another class to misuse the method(s) intended for Handler use. There’s also the issue of not making the implementation overly complex which long-term makes upkeep harder. I think going to the point of using AOP here is probably overkill, but I welcome anyone well versed in it to present the case for it if they think it fits.

As for the argument of needing to save on disk space when the change was a minor typo correction, maybe it’s worth setting up git in the complex_obs directory? (re: https://git-scm.com/book/en/v2/Git-Internals-Packfiles) While it does store the entire file by default,

Check the tree created by that last commit, and you see something interesting:

$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob b042a60ef7dff760008df33cee372b945b6e884e      repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b      test.txt

The blob is now a different blob, which means that although you added only
 a single line to the end of a 400-line file, Git stored that new content as
 a completely new object:

$ git cat-file -s b042a60ef7dff760008df33cee372b945b6e884e
22054

You have two nearly identical 22K objects on your disk
 (each compressed to approximately 7K). Wouldn’t it be nice if Git could store one
 of them in full but then the second object only as the delta 
between it and the first?

git gc will minimize the commits to deltas.

It turns out that it can. The initial format in which Git saves objects on disk
 is called a “loose” object format. However, occasionally Git packs up several
 of these objects into a single binary file called a “packfile” in order to save
 space and be more efficient. Git does this if you have too many loose
 objects around, if you run the git gc command manually, 
or if you push to a remote server.

This could be added to the ObsServiceImpl for complex obs, rather than changing the Handlers.

Thoughts, @dkayiwa @mksd @wyclif @mogoodrich @burke ?

jesplana · March 17, 2020, 3:36pm

As most OpenMRS systems are deployed in many different countries with varying requirements regarding data retention, data ownership, data sharing, etc.The ICRC has published different handbooks to guide humanitarian organizations in processing of data (please see the data retention section of the handbooks) :

ICRC’s Handbook on Data Protection in Humanitarian Action

(https://shop.icrc.org/handbook-on-data-protection-in-humanitarian-action.html)

This Handbook was published as part of the Brussels Privacy Hub and ICRC’s Data Protection in Humanitarian Action project. It is aimed at the staff of humanitarian organizations involved in processing personal data as part of humanitarian operations, particularly those in charge of advising on and applying data protection standards.

The Handbook builds on existing guidelines, working procedures and practices established in humanitarian action in the most volatile environments and for the benefit of the most vulnerable victims of humanitarian emergencies. It seeks to help humanitarian organizations comply with personal data protection standards, by raising awareness and providing specific guidance on the interpretation of data protection principles in the context of humanitarian action, particularly when new technologies are employed.

ICRC Rules on Personal Data Protection

Safeguarding the personal data of individuals, particularly in testing conditions, such as armed conflicts and other humanitarian emergencies, is an essential aspect of protecting people’s lives, their physical and mental integrity, and their dignity – which makes it a matter of fundamental importance for the ICRC.

It touches all areas of its activity, whether operational or administrative. As a result, the ICRC has adopted the following set of rules for protecting personal data, which will also enable it to remain at the forefront of international humanitarian action, even in the most challenging circumstances.

THE HUMANITARIAN METADATA PROBLEM: “DOING NO HARM” IN THE DIGITAL ERA

https://www.icrc.org/en/download/file/85089/the_humanitarian_metadata_problem_-_icrc_and_privacy_international.pdf

The digital era is enabling spectacular advances in the field of humanitarian action. It makes it possible to improve the effectiveness and efficiency of aid provision in regions affected by war or disaster. The use of technology – including smartphones, drones and other connected objects – can provide more relevant responses to the expectations and needs of local communities. Meanwhile, social networks can play a crucial role in information and awareness-raising campaigns, and as a means of communication and information disclosure.

However, the use of these technologies also leaves traces. It can allow for the tracking and profiling of individuals, with great risk to their lives, integrity and dignity. This applies regardless of whether the individual is seeking humanitarian aid or providing it as a humanitarian staff member or volunteer. Already, the data generated by the humanitarian sector has stirred up interest within certain governments, groups and intelligence agencies, at the risk of undermining the impartiality, neutrality and independence that organisations like the ICRC must demonstrate.

@mksd