Performance issues for cloud hosted OMRS instances

ngoel2 · August 5, 2017, 7:20pm

My organization uses a cloud-hosted version of OpenMRS (RefApp 2.6 distro) for a telemedicine use case. On the user-facing side, we have a custom module and also a mobile client that use the REST API to interact with the OpenMRS server. We’ve hosted it on Digital Ocean as well as a local VPS provider.

Dimitri @mksd (whose organization also has cloud-hosted deployments) and I were speaking about some common challenges we’re facing with hosting openMRS on the cloud, and one thing that was a challenge for both was the excessive amount of RAM that OpenMRS consumes. In my organization’s case we need at least 8GB of memory for a small telemedicine clinic deployment that has a total of 10 users but no more than 6 concurrent users at a time.

We’re also facing performance issues when we have more than one client making several REST API calls, the server starts to get non-responsive and sluggish.

After about a couple of days, the server’s memory consumption becomes very high, it becomes completely non-responsive and the only fix is to physically restart it. As a result, we’ve had to institute a policy to physically restart the server every day.

I was wondering if anyone else has faced these issues and why they might be.

Tagging my other team members - @dbarretto @prithiraj69 @amalafrozalam

raff · August 7, 2017, 9:54am

Thank you Neha for bringing up the issues! I’d like to work with you to address them. Would it be possible for you to list some of your REST API calls, which are particularly slow e.g.

GET 'http://qa-refapp.openmrs.org/openmrs/ws/rest/v1/concept?q=virus'

It would also help to get more stats from your system to help me simulate it locally. Could you please run the following mysql queries and report back results?

select count(*) from concept;
select count(*) from obs;
select count(*) from encounter;
select count(*) from visit;
select count(*) from patient;
select count(*) from provider;
select count(*) from users;
select count(*) from orders;

Finally, what is the amount of RAM used right after you restart the server and it becomes responsive, after an hour and a day of usage?

wyclif · August 9, 2017, 8:45pm

Hi @ngoel2,

Thanks for reporting this, do you run exactly one extra module? It would be nice to try and run the instance without the extra module(s) and confirm that they aren’t the ones introducing the memory concerns.

ngoel2 · August 11, 2017, 9:46pm

Thanks @raff - would love to get to work on this with you. Unfortunately I am packed for the next 10 days with a training for a telemedicine deployment we are doing in India.

Can we take this up after August 30th once the roll out is complete and I can give proper time to debug this issue with you? We’re definitely very very interested in fixing this issue.

ngoel2 · August 11, 2017, 9:51pm

@wyclif - good point, it may be our module that may be introducing the memory issues. However that wouldn’t explain why @mksd’s team is also facing the same issues. They have hosted a Ref App Distro and a Bahmni distro on AWS and also face the same issue.

I’ll try to see if just the ref app distro creates issues. The challenge is to simulate this issue. The memory consumption occurs over time and with an active cloud deployment with several users posting data everyday. I haven’t been able to come up with a metric yet for when the performance slow down begins to show up (eg: when number of encounters exceeds x, or number of concurrent users is y).

@mksd and I came up with the idea to create a “chaos” script that would just randomly post data to OpenMRS to simulate high volume use so that we can simulate the issue, but that’s proving to be easier said than done!

mksd · August 12, 2017, 8:46am

Sorry to weigh in only now but yes, my gut feeling is that it is not because of custom modules. About that particular OpenMRS instance that I have in mind, and although an upgrade would be welcome, memory issues were faced with pretty much an out-of-the-box Ref App. The quick and dirty solution was: automatic restart every week (that was enough).

+1 I second the idea of the “chaos script” tailored to stress the Ref App.