Automated cloud hosting of OpenMRS.

atlcto · July 4, 2019, 2:28am

I wanted to share a project I’ve been working on to get feedback and see if anyone is interested to also work on the project. My goal is to take the IT friction out of starting an OpenMRS instance. Thinking about the end user that is running a clinic and wants to get started with a EMR, may not want to have to learn about docker. For instance if someone wants to start a blog they either go to WordPress.com and start a free site vs going to WordPress.org to download the source code as a zip. My desire is to create a similar similar site for the hosting of OpenMRS instances (including multiple distributions/flavors). I was tempted to create a openmrs.com style site; but there are obvious trademark issues, so I started with hostedehr.org. End users are able to self enroll and provision subdomain sites so ckwc.hostedehr.org would be for my organization (Connecting Kids With Care) and as long as your name is not taken, users can create the subdomain and invite users to the OpenMRS instance.

Before you run off to visit hostedehr.org, yes there is a site there, and it somewhat works, but there is still work to be done before it’s ready to turn over to end users. I wanted to describe the architecture and get feedback.

First the site itself is written in node-js and is connected to a simple backend database (in mysql) to validate user’s e-mail addresses and keep track of who is the admin on which subdomain. Each OpenMRS instance is created as a separate database on a single “powerful” Amazon RDS server (but we can upgrade this to an even better server as more and more users host installations on the site). The OpenMRS instances also served on a on another single “powerful” Amazon server (Memory Optimized) as each instance needs at least 600 MB when it’s running. I started with the r5d.xlarge because it has 32 GB of memory and a 150 GB local drive that can be provisioned as a swap file to give it 182 GB of addressable memory. The application server itself is just running docker so each OpenMRS instance can actually run a different OpenMRS distribution / docker image. The application data for each OpenMRS instance is mapped to a directory to an NFS mount point on the docker server. It’s using Amazon Elastic File System (EFS) so it starts out with 8 petabytes (1024 TB) of storage space and will grow automatically. The docker server is then providing the security so that each instance can only have access to their files. With the data on the NFS drive we can actually destroy the docker server and rebuilt it from an automated script, so it doesn’t really become a “single point of failure”.

Now with all these servers running on docker they all can’t have port 8080, so we also have HAProxy running on the box to run as an internal proxy server to route the HTTP requests to the proper docker instance. It also provides us with a unique feature to be able to start and stop servers based on inactivity. The HAProxy is also running the SSL proxy to server the wildcard *.hostedehr.org SSL certificate, so clinics don’t have to worry about installing SSL and renewing certificates, etc. So the HAProxy queries the docker server to see which subdomains are running on which ports and makes the config file to route the subdomains to the proper docker server.

For all the servers that are not running, it routes the URL to a “special” node-js app we have running on the box. When that app get’s a URL request it then calls the docker API to start the appropriate instance (or create it on first run), then wait for server to startup. I’m not sure if we should increase the timeout on the original request to wait for the server to come up, or just return a loading page with an HTML refresh for the root url or the login, and return a 503 service unavailable for web service requests. In any case when the server comes up the HAProxy sees it and puts it back in config as a regular instance and the “special” node-js app is no longer proxing the requests.

Now the server is a specific size so there is a limit to the number of instances that can run on one box, HAProxy has a stats interface so we can tell when an instance hasn’t received any HTTP requests in some number of days, so we can then stop the instance in an automated fashion. All the data will still be saved to MySQL and the NFS mount, so as soon as a web request is made, the server will start itself back up (using the process described above). This allows us to only have the instances running that are actually in use, vs the 100s that may have been created at one point and not used. But we have all the data saved incase the end user decides to try keeping electronic records again.

Finally if there was ever a point where we upgraded the to the point of r5d.24xlarge which is 96 logical processors on 48 physical cores with 768 GB of memory, we could always create multiple docker servers and have the HAProxy operate over multiple hosts.

The code is still a work in progress, but I have each part functional to some degree and plan to fully open source it after it is functional.

So- does anyone see any obvious problems? Do we think this would be a useful service to have an option for people who would want OpenMRS as a more of SaaS solution than a pure software project. My non-profit (Connecting Kids With Care) is fine to cover the hosting costs for this project; and sign a patient confidentiality agreement for any organization that would need one.

cintiadr · July 4, 2019, 11:54am

HI @atlcto,

I’ve been running all our test OpenMRS instances, so I have a little bit of advice to give.

That’s pretty nice I suppose the obvious question is about infrastructure costs, how you plan on receiving enough money to keep the costs under control, but I don’t have any advice here

Also I keep operations that keep health data for a living, so I’ve learned a thing or two. When handling health data, you have two priorities: security (privacy&confidentiality) and legislation compliance. Both come with their own problems.

I’m not sure about that part, but I think it might be easier for users if they are not mislead to believe it’s community supported.

That’s ok, but that means one client can bring all customers down on the same RDS. It makes disaster recovery, backups and even staging a tricky operation, as the whole RDS is going to be encrypted using the same KMS key (which I ASSUME you configured, right?). It’s cheaper, but you are trading price vs operationalisation, security and a couple more things. Also, make sure to configure OpenMRS to talk to RDS via SSL. It’s a little bit boring with RDS. I also assume this RDS is not available from outside your VPC, and that it’s multi-az.

I haven’t tried aurora, but that might just be the thing you need.

From my experience running multiple JVMs, that looks too little. You might eat way too much CPU due to garbage collectors. With swap enabled, seems like a recipe for slow applications.

Note that you should only use swap if your instance has Instance storage (and not EBS-only).

It’s a shame OpenMRS keep some state on the filesystem, because otherwise you could use fargate.
When you keep your own instance, you need to address the following problems by yourself:

autoscaling (and any provisioning needed)
autohealing
centralised logs
OS patching
OS hardening
monitoring&alarming
encryption at rest
SSH access (and the security on it)

But you might prefer to have ECS instead, really. That might help you with a lot of the problems (except OS patching and hardening). Cloudformation support is pretty decent. It might be much cheaper and easier to maintain.

Sensible. Some people don’t trust EFS durability, but I don’t know what to use for backups. Might be mandatory for your context.

I sort of do the same for our community servers; other solution is deploying traefik or envoy. I use traefik and it’s great.

If you are using ECS or fargate, that wouldn’t be a problem you have.

Again, you could automate that with letsencrypt, or just use AWS certificates on the ELB or similar, so you’d never had to rotate certificates.

I’d never put an EC2 directly on he internet without an ELB or ALB in front of it. More: I don’t feel particularly easy about hosting it with health data publicly, without the network securities. And without WAF in front of it, I think it’s a bad idea.

Now, if you ask me, our docker images are not hardened for production (not internet/public facing). They are running as root, we don’t apply patches, we didn’t cleanup unused binaries. That would be an awesome work to have.

I also know that every country has different laws about where health data can be stored, how can they be accessed, for how long you need to keep, and so on. It’s always a little bit of a nightmare to get any system compatible with HIPAA or ISM.

The new application also probably would be better to have a security engineer to take a look. Note that OpenMRS itself haven’t had a pentest in a while.

I’ve also learned some installing certain plugins can cause OpenMRS to go down (but I have monitoring). It’s not that easy, but you might want to make sure you limit all memory and CPU in the docker container, to ensure you are not killing all hosts in one go.

cintiadr · July 4, 2019, 12:02pm

Healthcare data is just too good for hackers:

https://www.popsci.com/why-do-hackers-want-your-health-data/

There are a lot of countries which forbid healthdata to leave the borders of said country. Governments spend a lot of time and money on regulatory measures for health data for that very reason, and that’s why they change depending on your location.

If you are willing to get into the business of hosting health data, a lot of security effort will be needed.

atlcto · July 4, 2019, 12:58pm

Thanks @cintiadr. That is a lot of good feedback and I’ll definitely look into some of your suggestions in more detail. In the meantime just wanted to follow up with some more info to answer some of your points.

I mentioned my non-profit Connecting Kids With Care; which is actually an initiative of Jackson Healthcare. Jackson Healthcare makes over a billion dollar per year in revenue and fully supports our non-profit. I assume there would not be funding issues assuming the infrastructure costs relate to improving patient care.

I actually started with Fargate, but the cost was to high considering the CPU the memory requirements to run tomcat / OpenMRS. It worked out better to have a more powerful box that shared between the instances than trying to defined lower dedicated resources to each instance.

At this point the only publicly accessible port open on the box is 443 (configured with the security groups) and is connected to the HAProxy, and everything else is internally routed, but it’s a valid concern and easily mitigated with a WAF.

Yes, we might be need to add a country drop down when creating an installation to track which regulations might apply. I have been working with cloud hosted healthcare data for the last 7 or so years and before that worked on online casino software. In between there I had a startup that issued a mobile focused Mastercard debit card and ran a telemedicine startup; so I am familiar with complying with relevant security rules and regulations. In my college years I may have been a hacker… I mean a pentester… . That is the reason I didn’t want to open source it until it was fully functional and had been reviewed for security concerns.

That will be part of the challenge, but hopefully by only allowing specific distributions and modules it will give us an opportunity to find those bugs and fix them, such that they become hardened and more reliable for the entire OpenMRS community.

dkayiwa · July 4, 2019, 1:22pm

This is an awesome initiative and thanks for sharing your progress @atlcto

I foresee quite a number of use cases for this. In the meantime, while setting this up, have you come across any problems which would be addressed by improving the architecture of OpenMRS? @cintiadr has already advised against keeping state on the file system.

cintiadr · July 4, 2019, 1:51pm

It would be really nice if you could spend some time to harden our docker images. I don’t think we had anyone seriously thinking about hosting production openmrs using our docker image, and available without any network security layer.

Well. Not being able to run multiple instances against the same DB is quite bad for production environments which require high availability.

cintiadr · July 4, 2019, 1:58pm

Also, having two endpoints (one that answers ‘ok’ and another one with application health information) could be super useful for setting up autohealing and monitoring. But much less important.

atlcto · July 4, 2019, 2:14pm

It would be nice to have the startup time improved. Maybe that would be a good GSoC project for next year to profile the application startup and see if there are areas that can be optimized.

I think @isears did some work on a production docker image: Docker Reference Application in Production? - #14 by isears

cintiadr · July 4, 2019, 10:23pm

Yeah, he created his own docker image, not improved the official ones (that is part of our SDK)

isears · July 5, 2019, 10:57am

Yeah, I’ve been running it in small-scale (10 concurrent users max) production for ~18 months now behind an nginx TLS proxy:

https://hub.docker.com/r/isears/openmrs-referenceapplication

Really the only “security hardening” that went into it was removing the demo data, disabling debug mode, and allowing users to specify a specific admin password rather than defaulting to Admin123. There’s definitely a lot more work that could be done.

Would love to see a community-supported production image in the future. I think that’s going to have to be the first step to using OpenMRS in a modern cloud environment.

@atlcto this is all very exciting, definitely keep the community up to date on any progress you make!