@mksd I first noticed this behavior when working on converting the Drawing module to use SVG. Even though there is code for immediately loading ComplexData (https://issues.openmrs.org/browse/TRUNK-5198) in the ObsService (which in my opinion should be lazy loaded in the Obs class getObs() itself), when using HFE FormSessionContext.getExistingObs(), I was seeing the data member was still null and started calling the ObsService#getObs(Integer) (How this happens via Spring/Hibernate ORM, in relation to Encounter#setObs() obs loading, might also be worth documenting in the Encounter class? @wyclif Maybe you’d be able to explain it clearly given you’re work with Annotations there).
I later started setting the previous observation on updated observations, and I began to see the file was being deleted and corresponding FileNotFoundExceptions when viewing encounters and attachments. This is due to both having a previous complex observation and loading the data via Handler (which currently, must call setComplexData() to set the ComplexData member). This causes the code path at ObsServiceImpl#saveObs(Obs, String) to take the branch that calls ObsServiceImpl#saveExistingObs(Obs, String), which in turn calls ObsServiceImpl#voidExistingObs(Obs, String, Obs) where this code actually deletes the file.
if (newObs.hasPreviousVersion() && newObs.getPreviousVersion().isComplex()) {
File previousFile = AbstractHandler.getComplexDataFile(obs);
previousFile.delete();
}
In the HFE code path for Edit encounter that calls the Drawing SubmissionElement, even when no change has been made and no new file needs to be written, there is code that leads to this code path and deleting the file! (It actually REQUIRED creating a new obs/file to workaround the deletion causing an unnecessary disk write!) This is actually a result of logic in core and not really a HFE specific issue. Loading the data to show the user, marks the obs dirty, which tells core that the file has been updated and needs to be rewritten, even though it was only viewed.
This code was introduced in 2.0 for https://issues.openmrs.org/browse/TRUNK-4538, and seemed to be the cause of various cases like https://issues.openmrs.org/browse/LUI-135 on JIRA. I also found HFE had a TODO to set previous observations, and implemented that feature (https://issues.openmrs.org/browse/HTML-675). It may just be that this was an edge case before, since previous obs wasn’t necessarily being used that often (though it should be), or that it was even actively avoided if anyone encountered this issue before. This behavior seems like it needs attention in the short term before it becomes a bigger problem in an implementation somewhere.
As I mentioned, lazy loading the data seems like one way to go (but would also require clearing the dirty flag from the call to setComplexData()
(!), which does seem to indicate a design problem) or alternatively, adding an Obs ctor that takes a ComplexData
argument. One other option @mksd mentioned to me in another discussion that’s worth considering is removing the markAsDirty()
from the Obs class setComplexData()
and providing a method for the Handler to determine and set the dirty flag itself, since interpreting (or at least knowing how to save/load) the ComplexData is closer to being the Handler’s responsibility than being the Obs class’.
The biggest consideration about any approach is providing a way for the data to be loaded by a Handler without allowing another class to misuse the method(s) intended for Handler use. There’s also the issue of not making the implementation overly complex which long-term makes upkeep harder. I think going to the point of using AOP here is probably overkill, but I welcome anyone well versed in it to present the case for it if they think it fits.
As for the argument of needing to save on disk space when the change was a minor typo correction, maybe it’s worth setting up git
in the complex_obs directory? (re: https://git-scm.com/book/en/v2/Git-Internals-Packfiles) While it does store the entire file by default,
Check the tree created by that last commit, and you see something interesting:
$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt
100644 blob b042a60ef7dff760008df33cee372b945b6e884e repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b test.txt
The blob is now a different blob, which means that although you added only
a single line to the end of a 400-line file, Git stored that new content as
a completely new object:
$ git cat-file -s b042a60ef7dff760008df33cee372b945b6e884e
22054
You have two nearly identical 22K objects on your disk
(each compressed to approximately 7K). Wouldn’t it be nice if Git could store one
of them in full but then the second object only as the delta
between it and the first?
git gc
will minimize the commits to deltas.
It turns out that it can. The initial format in which Git saves objects on disk
is called a “loose” object format. However, occasionally Git packs up several
of these objects into a single binary file called a “packfile” in order to save
space and be more efficient. Git does this if you have too many loose
objects around, if you run the git gc command manually,
or if you push to a remote server.
This could be added to the ObsServiceImpl for complex obs, rather than changing the Handlers.
Thoughts, @dkayiwa @mksd @wyclif @mogoodrich @burke ?