Demo: Voided Observation Archiving — Request for Community Feedback

twiine · March 16, 2026, 3:46am

Hi OpenMRS Community

I’m a GSoC 2026 applicant exploring the idea of archiving voided observations. Before going further, I wanted to share a working demo and hear your thoughts.

Watch the Demo

>> Click here to watch the demo <<

The demo covers the full flow: dry-run preview, manual archiving, scheduled jobs, analytics dashboard, and restoring archived observations.

The jobs Scheduler

Observations Analytics

Configurations

The Problem

In large OpenMRS deployments, the obs table accumulates millions of voided rows that still get scanned on every query. This slows down patient lookups, reports, and concept searches — even though that data is clinically inactive.

Technical Strategy (High Level)

I evaluated several approaches before settling on one:

Approach	Why I chose or passed on it
Separate archive table (chosen)	Voided obs are moved from `obs` to `obs_archive` via `INSERT...SELECT` + `DELETE`. The archive table mirrors the `obs` schema exactly, so every column is preserved and restoration is straightforward. The `obs` table immediately gets smaller — queries get faster, backups get lighter.
Table partitioning	MySQL requires the partition key in the primary key. Partitioning `obs` by `voided` or `date_voided` would mean altering the primary key of the most critical table in OpenMRS — too invasive, and partition pruning only helps queries that filter on the partition key, which most obs queries don’t.
JSON / schema-less archive	Serializing obs rows to JSON would lose the relational structure and make selective restores expensive (parse JSON, validate, re-insert). It also makes it harder to query the archive for analytics or auditing. Keeping the same relational schema means you can query `obs_archive` with the same SQL you’d use on `obs`.
Soft delete flag (`archived = true`)	This is essentially what `voided` already is. Adding another boolean doesn’t reduce table size or improve scan performance — the rows stay in `obs`.
Physical deletion (purge)	Permanently destroying clinical data is a non-starter in healthcare. Even voided observations may need to be recovered for audit, legal, or clinical reasons.

I’d Love Your Feedback

This is where you come in. I want to make sure this solves real problems for real deployments, not just a demo scenario.

Implementers — How big is your obs table? How many voided rows do you typically have? Would a 30-day default retention period work for you?
Mentors — Does this approach align with the direction you’d want for this GSoC project? Any architectural concerns I should address early?
Community — Are there other tables in your deployments that reference obs that I should handle during archiving? Any restoration scenarios I haven’t thought of?
Everyone — What would make you actually trust and use this feature in production?

CC @dkayiwa , @samuel34 , @mherman22 , @ibacher , @wikumc , @jayasanka , @jnsereko , @chibongho, @dkigen, @raff , @icrc.psousa

sempereza · March 16, 2026, 5:19am

This is wonderful it’s great I love your dedication bro keep it up