OpenMRS and AWS Instances

yadamz · August 22, 2018, 9:53pm

Hello, What is the recommended type of AWS EC2 Instances will work best for OpenMRS? For example is R5 better than M4?

The different types of EC2 Instances is available here https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html

I appreciate any recommendation in this regards

isears · August 22, 2018, 10:56pm

I actually just finished deploying OpenMRS on AWS!

At first, just to see if we could, we deployed on a t2.nano with a db.t2.micro rds instance for the database. Using these super-small instances allowed us to stay within free-tier limits, so our AWS usage was completely free. However, using the free-tier resources really isn’t possible if you have any more than 1 or 2 concurrent users, and even at that level, the system can be sluggish.

After that, we upgraded to a t2.small ec2 instance for the server with a db.t2.small rds instance for the database. That worked well, and our AWS bill was only about $32/mo. Also, since we were using RDS, we didn’t have to put any work into managing database backups, the AWS-managed backup scheme was good enough for our purposes.

Ultimately, however, we really wanted to cut costs, so we ended up installing everything (server and database) on a single t2.small ec2 instance. Our AWS bill is closer to $15/mo. now, but we had to write our own cronjobs to manage database backups. Our deployment is completely dockerized, you can check it out here: https://github.com/fortitudoinc/fortitudoinc-infra. Feel free to (re)use any of our docker deployment.

Note that we have a relatively low usage (best guess no more than 5 concurrent users at any one time), so if you expect more traffic, your mileage may vary!

yadamz · August 22, 2018, 11:20pm

Thanks very much Isaac. This is amazing information.

Do you have a rough estimation of specification requirements per lets say 10 concurrent users ? It is possible a basic question, but why did you use nginx ?

isears · August 23, 2018, 12:34am

Hard for me to say exactly what 10 concurrent users will require, but I would start with a t2.small and upgrade to a t2.medium if you notice the system gets slow.

The only reason we used nginx was because the pre-built TLS proxy on dockerhub uses nginx. It handles all the TLS certificate management through letsencrypt pretty well.

suthagar23 · August 23, 2018, 1:43am

If you are going to deploy the OpenMRS for the largest usgae (approx 50+ concurrent users at a time), then you should think about the load balancing for the instances. Nginx also handle this very well based on the requests and it will map the requests to your nodes based on the availability.

At the same time, a small type of machine micro) would be enough to keep for the staging environments which is very important for the Cloud native deployments.

yadamz · August 23, 2018, 1:58am

Thanks very much @suthagar23 . I just want to clarify something u said: " the largest concurrent users which is 50 users". Are u saying that OpenMRS cannot handle more than 50 concurrent users ?

suthagar23 · August 23, 2018, 2:08am

Nope, you are complicating concurrency with the load balancing . OpenMRS has the concurrency power for enough amount of concurrent users. But the efficiency and the time taken to process the requests depend on the machine environment which used for the OpenMRS deployment. If you use the t2.nano machine for the deployment, then the machine is capable to process and manage a small number of requests at a time. So if the requests are higher, then the users other than the quota need to wait until the termination of existing process. So those waiting users will not get the responses as soon as they requested. It will reduce the usabilty and the end user satisfication. BTW if the number of requests is very high, then the machine may go under termination. So you need to manaually restart the process to fix the services.

So we manage multiple instances with Nginx to avoid those concurrent access issues. Hope you get it!

yadamz · August 23, 2018, 2:37am

Ahh @suthagar23 IC

I never used NGINX before … I have Tomcat 8 installed and OpenMRS is already deployed . Can I go ahead and install NGINX? Do u have a good guide on how to install? Do I have to do any maintenance after installation?

I appreciate ur help …

suthagar23 · August 23, 2018, 2:46am

AFAIK NGINX is not much needed if you are not going to deploy for a huge crowd. If you have only one instance/node in the AWS then there are no use to go for NGINX.

If you plan to use the NGINX for your deployments, then you need to create multiple nodes in the AWS and you may need some help from the dedicated DevOps people who can manage those things easily.

So better options would be reflect your requirements, and the expected amount of users who can tried to use the system at a given time here, and Ask the suggestions from DevOps. I think @cintiadr could give more advice on that

cintiadr · August 23, 2018, 12:00pm

Well, it always depends on how much you are willing to pay, how much load and high availability you need!

I’m afraid I’m going to confuse you more than help.

These are the official requirements for OpenMRS: https://wiki.openmrs.org/display/docs/System+Requirements

Java applications tend to be very memory intensive, and not really CPU intensive. Usually, the only high CPU operations are the memory garbage collector due to low memory configurations.

By default, you either go with the M family or the T family. There’s now a T3, which might be more interesting for you than a T2.

The M family guarantees you consistent CPU shares; T family allows you to burst CPU a couple of hours per day, but it’s based on credits. It’s possible to set T2/T3 to be unlimited, and you pay if you use more credits than you had. T3 or M5, choose what you prefer. If you are going to run it for several months, think about buying reserved instances (they get cheaper).

If you don’t want to maintain your own database, you can use AWS one, RDS. Either single node without high availability (single-AZ) or cluster (multi-AZ). I cannot tell you which size to choose, I haven’t been keeping any OpenMRS production.

I know that OpenMRS maintain a folder with data, I suppose there’s some documentation on how to back it up. You can think about using NFS/EFS for that folder, so you wouldn’t need any backup from the machine itself. If EFS is not a possibility, you could use EBS snapshots/backup daily on it (I think you’d need to write your own code here). Otherwise, a cron task to S3 is possible for sure.

Instead of using letsencrypt and nginx, you could set an ACM (AWS certs) + ALB (AWS load balancer), which are pretty much free.

As you are handling health data, I’d still recommend setting up any self-signed certificate in your tomcat, so the ALB will talk https to your EC2 instance. E.g. https://hutter.io/2016/02/09/java-create-self-signed-ssl-certificates-for-tomcat/

I do handle a lot of health data in AWS, so I do have a lot more comments on security, but I don’t want to confuse you even more.

@suthagar23, can you load balance OpenMRS at all? I thought only single node installations were possible…

cintiadr · August 23, 2018, 12:03pm

I find this one less confusing than AWS ones: https://www.ec2instances.info/

Usually, new generations are supposedly better than older ones. So, M5 are newer and cheaper than M4. A T3 is better than a T2.

That said, sometimes the older generation is cheaper.

isears · August 23, 2018, 12:30pm

@cintiadr would love to hear your comments about security w/regards to AWS + OpenMRS. Is there a confluence page that documents any of this?

cintiadr · August 23, 2018, 12:47pm

I don’t think we have that.

Darius created Playing around with OpenMRS in the cloud , but it was Google Cloud.

OpenMRS is a pretty standard java application, it’s not particularly special from the OPS point of view (as far as I can see it). What makes is very special is the type of data it’s holding - it’s health data, which is probably the most dangerous type of data we can handle.

But security is done in the very same way as any other sensitive application in AWS.

We are talking here about encryption at rest everywhere (encryption - via KMS - to EBS, databases, S3 buckets, EFS) and encryption in transit (TLS in all places). Security groups need to be pretty restrict (only allow 443, for example), and you can apply security on the network level too (NACL rules), and enable logs for network (flowlog). You can enable Guardduty to try to find any weird behaviour. Your S3 bucket policies should be very strict, also your AWS console should be strict and with MFA.

Ideally your operational system is hardened (let’s say, with CIS benchmark or SELinux) and regularly patched. You should have audit logs and centralised logs. Access to machines should be controlled, secure and auditable.

If you want to protect your system from DDoS attacks and other threats (like SQL injection), you can add a Cloudfront + WAF in front of your application. It’s nice as well to do IP whitelist here if desired. Other possibility is setting up a VPN tunnel between the place which should access the application and the AWS VPC; so the application is not publicly accessible.

But sure, everything has a cost.

mksd · August 23, 2018, 4:46pm

A lot has been said here so I will just add a very short contribution: t2.medium

That’s ignoring the newest T3 and giving a middle ground for production use. If you face a typical workload in the realm of “usual” OpenMRS implementations, then that’s your default to sleep tight at night.

If you don’t reserve it, that is about $42 monthly.