O3 Performance Improvement

dkayiwa · May 1, 2024, 8:20pm

As we get closer to the first full release of O3, we are looking into areas that are slow and hence need performance improvement. Do you know of any such widgets or places? Feel free to share as many of them here so that we create tickets for them to be worked on.

cc @PalladiumKenya @METS @Mekom @PIH @UWash @UCSF @ICRC @GlobalSupportTeam @michaelbontyes @jnsereko

slubwama · May 2, 2024, 8:22am

@dkayiwa one of our observations regarding the speed of web rest API is the use of representation full. It is way slower than when using of default or custom is representation.

aojwang · May 2, 2024, 10:08am

We can also think about bundling requests for registration (contains person, person attributes, patient identifiers, observations, etc) and endpoints for form entry to support complex requests for program enrolments etc. The current implementation has to manage/chain multiple requests to the server which aren’t super with regard to error handling

mksd · May 5, 2024, 11:11am

Thanks @dkayiwa for bringing this up.

In regards to the usual understanding of performance, it would be great to step back and analyse it through the lens of the current users experience of deployed OpenMRS instances out there. What do the users think? How critical can our users be of the performance of their EMR system through their daily use? I’m afraid this would be a tough exercise as folks unfortunately get used to poor performance when they suffer it rather than complain about it (unless it’s way out there in the non-acceptable zone). But perhaps a targeted research would uncover the main pain areas, if any.

On the other hand, as service providers of OpenMRS, we do have a long lasting real performance-related issue on our hands: we must, at times, restart the OpenMRS service because of memory leaks or other crashes. I would like to start researching this much more actively and uncover the root causes of those crashes. I will engage our engineering team at @Mekom to find out more and get data about this array of issues.

burke · May 5, 2024, 4:50pm

I think @dkayiwa is asking if there are any specific performance pain points in for users that have risen to the attention of those, like yourself, supporting OpenMRS 3 implementations. If so, we want to prioritize analyzing & addressing those.

A primary goal for Platform 3 is migrate away from using our custom class loader to start/stop modules. While using our own custom class loader for this purpose has the benefit of hot loading/restarting modules, it has some significant downsides: slower dev cycles, less predictable production environments, and – to your point – the introduction of memory leaks.

grace · May 7, 2024, 5:47am

The biggest thing I have heard from developers who are leading implementing O3 in live-sites in their country is exactly what @aojwang mentioned about re. bundling requests, and the inefficiency of 1 by 1 calls. So to what Antony posted above

Re. users’ experience of performance: Yes, the users are noticing that O3 can be slower on any given number of pages. It was painful watching this experience in multiple clinics in-person myself and seeing this sometimes slow down the registration or triage desk. This is why I raised this Talk post as far back as October as a request for us to use the performance tab in the dev tools to investigate the causes, since I suspected it’s death-by-1000-cuts with a number of various calls being less performant than others. Maybe Antony’s suggestion re. bundling would solve this user experience better.

On a communication note… I’m sorry if I wasn’t clear enough that I was seeing real production sites having this problem - re. “What do Implementers think”, this was a problem expressed by Implementers back in October and even earlier, which is why I escalated it. I was especially concerned that sites were still reporting problems by October because that was even after and with the fixes made on the frontend in esm-core v5 that significantly improved performance.

burke · May 8, 2024, 3:32pm

It would help if we could get to specific examples – e.g., a specific step in registration that is slow in real world settings. If we can replicate the slowness in demo environments, then we can set a specific performance target to hit. If we are unable to replicate the performance issue in our demo environments, then maybe we need to generate sufficient demo data to simulate the real world problem.

For example, @grace’s post on fetching locations, which led to realizing we needed to index locations and then to newer Tomcat performance issues.

For issues that turn out to be related specifically to moving too much data, I’d suggest we handle those under the topic of bandwidth usage.

kmakombe · May 9, 2024, 7:20am

If the demo environment has high end resources/specs, it might be difficult to replicate the slowness. Not sure if we can benchmark with some bare minimum specs(Different facilities will have different devices - some with low resources).

raff · May 9, 2024, 11:27am

The demo environment resources can be easily restricted since we run in docker containers and can put hard limits on memory and cpu.

grace · May 30, 2024, 8:01pm

This is a very interesting idea. The O3 Demo environment currently has: RAM 16GB, Disk space 50GB, and 4 CPUs. However, according to @ibacher recently, we aren’t actually using all of this. I wonder if our demo environments should use a bit less of something, somewhere, so that they are a tighter representation of a typical implementation machine.

@kmakombe & @slubwama or @mmwanje & @mksrom - what do you find are typical memory or cpu limits of field hardware? Or - do you think 50GB disk and 4 CPUs is already fairly representative?

@jayasanka and I try to encourage manual QA testing before RefApp releases to be done using the brower dev tool’s “throttled” mode, but this is not the best performance QA solution.