GSOC 2026: Support for horizontal scaling of OpenMRS instances.

Hi everyone

I’m Suubi Joshua from Kampala Uganda, this summer I am working on Support for horizontal scaling of OpenMRS instances as part of GSOC 2026. I will be sharing updates about this project every week to show my progress and share any blockers, or for any one who might have any questions or clarifications on what I am doing throughout the project.

What is this project about?

Historically, many OpenMRS implementations have relied on traditional, monolithic single-server setups. While this works well for individual clinics, massive national-scale health networks or regional hospital systems can quickly push a single app server to its limits, creating performance bottlenecks and risky single points of failure.

This project aims to completely blueprint and build a production-ready, highly available, and cloud-native scaling architecture for OpenMRS on Kubernetes. By moving away from rigid, manual configuration patterns and leveraging modern infrastructure orchestration, the goal is to ensure OpenMRS can handle massive spikes in clinical traffic while keeping patient data entirely self-healing and continuously online.

How it works

To run OpenMRS across multiple servers simultaneously, I am breaking down the traditional “all-in-one” model into independent, coordinated layers;

  • Smart Startup: Automatically coordinates servers during simultaneous boots so they don’t fight over configuration files. The first server handles the one-time setup while the others wait their turn to start smoothly.

  • Shared Sessions: Syncs active user logins across the entire server pool so clinicians never get suddenly logged out if the system shifts their traffic to a different replica.

  • Unified Storage: Connects servers to a shared cloud storage network so patient documents or lab charts uploaded to one replica are instantly accessible to all others.

  • Clustered Database & Search: Moves from single database and search instances to cooperative clusters. This automatically balances heavy clinical traffic and keeps searches lightning-fast.

  • Advanced Routing: Isolates traffic management from the core application code, allowing deployments to seamlessly route clinical traffic and safely test out new software updates on a small subset of servers.

Key Goals

  1. Dynamic Scaling: Establish automated rules that seamlessly spin up new OpenMRS server replicas during peak clinical hours and spin them down when traffic drops.

  2. Zero-Downtime Upgrades: Fully transition our database and search layers to automated management systems capable of self-healing and running maintenance without going offline.

  3. Seamless User Experience: Ensure absolute data consistency across all server replicas so clinicians experience zero session drops or missing file attachments.

  4. End-to-End Security: Enforce mandatory data encryption for all internal communications between servers by default.

Resources

Thank you for you support so far.

cc: @raff @beryl @jayasanka

1 Like

Community Bonding Update

Hi everyone!

Now that the official coding phase kicked off, I want to share a recap of what has been happening behind the scenes during the Community Bonding period.

My mentor @raff and I used this time to align on the technical roadmap and architecture. We dove straight into the code to clear out legacy infrastructure blockers and lay down the foundational architectural pieces so that true horizontal scaling can happen smoothly.

Here is a breakdown of the milestones and pull requests (PRs) we have already successfully achieved and merged across our primary project repositories:

:hammer_and_wrench: What’s Been Done So Far

  • Mentor Alignment & Architectural Reviews: Held productive sync session to align on our implementation strategies, review the overall scope, and establish a clear development pipeline for the weeks ahead.

  • Built and Published Custom Elasticsearch Image (openmrs-contrib-elasticsearch PR #1): To support cluster-wide search functionalities natively, I built a custom Dockerfile that pre-installs the analysis-phonetic plugin. I also wired up a GitHub Actions workflow to automatically publish this image to Docker Hub. This has already been successfully reviewed and merged! Credit to @ibacher for the guidance on this.

  • Eliminated Startup Crash-Loops (openmrs-contrib-cluster PR #10): I resolved a critical race condition where application servers would crash-loop if they tried to boot up before the background search index cluster was fully ready. By introducing smart readiness checks, the servers now wait patiently, paving the way for safe, automated scaling (HPA).

  • Upgraded to the Official MariaDB Operator (openmrs-contrib-cluster PR #9): We completely ripped out legacy database blueprints and replaced them with a cutting-edge MariaDB Galera cluster. Database scaling is now entirely automated; the system natively supports synchronous data replication across multiple active database nodes.

  • Migrated to the Official Elastic Operator (openmrs-contrib-cluster PR #8): To ensure search operations scale smoothly, we moved search management over to the official enterprise operator, migrating from legacy Bitnami blueprints. This upgrade enforces secure, encrypted communications by default and allows implementations to split up heavy search workloads onto dedicated servers.

:rocket: What’s in the Pipeline (Current Work)

  • Modern Routing via Traefik & Gateway API (openmrs-contrib-cluster PR #12 - Open): Right now, I am actively working on replacing the old network routing setup with the modern Kubernetes Gateway API powered by Traefik v3. This will cleanly isolate traffic management from our application code, unlocking capabilities for safer software updates (like Canary or Blue/Green rollouts).

  • Re-architecting Shared Storage & Migrating from MinIO: I am currently evaluating a critical transition for our S3 object storage layer. Because the open-source community track for MinIO has been frozen and permanently archived, it introduces long-term security debt and unpatched vulnerabilities. To solve this, I am actively exploring and gathering community feedback on replacing it with a secure, highly active, and lightweight alternative, specifically SeaweedFS or Rook/Ceph to serve as our new unified, S3-compatible cloud storage engine.

:construction: Blockers & Next Steps

Fortunately, there are no blockers! The community bonding period has been incredible for aligning closely with my mentors and landing these prerequisite building blocks.

As we transition into the core coding phase, my immediate next steps are:

  1. Finalizing and merging the modern routing updates.

  2. Resolving the S3 object storage migration based on community and mentor feedback.

  3. Beginning formal load-testing to calibrate automated scaling rules for the core application servers.

A huge thank you to my mentor @raff for the stellar guidance, technical alignment, and rapid code reviews so far.

Medium Blog

5 Likes

Update

Hi everyone!

Here is a recap of what went down in Week Two of the official coding period.

This was a lighter week by commit count but a heavier one by decision quality, two major items moved forward: a significant routing upgrade got merged, and our long-running storage debate reached a community consensus.

What Shipped

Migrated Edge Routing from Nginx Ingress to Traefik Gateway API (openmrs-contrib-cluster PR #12 — Merged Replaced the legacy ingress-nginx controller with the Kubernetes Gateway API powered by Traefik v3.x. Key highlights:

  • Introduced a new helm/traefik-gateway/ chart as a cluster-wide, reusable gateway
  • Replaced all kind: Ingress resources with native HTTPRoute resources in both the backend and frontend charts
  • Preserved legacy Ingress templates (disabled by default) for compatibility with K8s providers that still rely on classic ingress
  • Solved a multi-replica login failure caused by CSRFGuard token mismatch across pods, implemented cookie-based sticky sessions via Gateway API + TraefikService, with automatic activation when replicaCount > 1

In Review

Replace archived Bitnami MinIO with SeaweedFS (openmrs-contrib-cluster PR #13 — Open): Following the community discussion on Replacing MinIO with SeaweedFS or Rook/Ceph, and with clear direction from @raff and community member @ranidunethma in favour of SeaweedFS for its simplicity and lightweight developer experience, this PR replaces the unmaintained Bitnami MinIO subchart with the official SeaweedFS chart. Currently addressing one review comment from @raff on whether the filer component is needed in our S3-only setup.

Next Steps

  • Land PR #13 (SeaweedFS)
  • Begin the openmrs-operator Helm chart, the automation layer that bootstraps all prerequisites so a new cluster deployment is a single helm install away
  • Begin application-level readiness coordination to ensure the OpenMRS app layer waits correctly for all dependent services

Medium blog.

2 Likes