I recently came across Google’s Cloud Spanner database, which seems to have a great value proposition (automagic, infinitely-scalable SQL database); I was very excited to try to apply it to OpenMRS, before reading more and realizing it’s nowhere near being a drop-in replacement for MySQL. But this got me interested to check out the latest cloud tools, and I want to share what I did, along with some broader thoughts:
(Note: a lot of this is not yet relevant for most implementations, because usually people run on-premise servers for local performance, and because AWS and GCP don’t have any data centers in Africa, and in-country hosting is often a legal requirement.)
Short summary:
- I set up a database using Google’s Cloud SQL. This tool is really impressive, and worth considering.
- I deployed OpenMRS using Kubernetes. (Since OpenMRS isn’t horizontally scalable this is pointless, but I was curious.)
More details:
(1) Google Cloud SQL
This is a very impressive tool. It gives you a fully-managed cloud instance of MySQL with automatic backups and security updates. And it has a really easy UI that lets you do complicated things in just a few clicks (e.g. create failover or read replicas, change the machine type, etc).
I would highly recommend considering this option, if you can do an offsite database, and want to do fancy things with it, without having much DBA expertise. (I haven’t really calculated the costs for a realistic deployment.)
AWS’s RDS offers the same value proposition, but the management UI seems more powerful/flexible and not nearly as easy.
Note: you need to install OpenMRS with binary logging turned off or else you get this error. You can enable binary logging (required for replication) once the initial database is created. (The problem is with the liquibase changeset with id 20090122-0853
.)
(2) Kubernetes
Kubernetes is a container orchestration system which lets you declaratively do things like “deploy docker image xyz behind a load balancer and spin up another instance when CPU usage gets too high”. It gets a lot of buzz (including an Adopt on the ThoughtWorks tech radar) so I wanted to play around with it.
Here’s a quick example of deploying the official RefApp 2.8 docker image, pointing to the Cloud SQL mysql instance from (1): https://github.com/djazayeri/openmrs-contrib-gcptest/tree/master/kubernetes
A few things I know are wrong (I’m sure there are more):
- this works on minikube but not on GKE (because I’m not sure how to find the IP from which the GKE openmrs will connect to mysql, so I can’t authorize it)
- should use an Ingress or LoadBalancer instead of NodePort
(3) Horizontal scalability?
In practice there’s no point in orchestrating a multi-node OpenMRS deployment with Kubernetes, because we don’t support horizontal scaling of either the application server or the database. We should really start to work on this, though! (Driven by actual usage patterns, of course.)
A few ways to approach scaling:
- The “correct” solution is to refactor our whole web application to remove/externalize local disk storage, or local state, so that we can just run multiple tomcats behind a load balancer with sticky sessions.
- Put a reverse-proxy cache in front of our REST web services (useful for distros built on client-side code)
- Use read replicas of the db for some things:
- refactor Patient Search and Concept Search to be microservices backed by a read replica instead of the master DB.
- have the reporting module automatically support using a read replica database
- route all GET requests to REST so they’re served from a read replica
(Many of these start to run up against the fact that one of OpenMRS’s success factors has been that it can be installed on very simple hardware, and I’m starting to describe approaches that need docker-compose at a minimum.)