We’ve been drafting a design for openmrs-module-querystore — a separate, denormalized, read-side projection of OpenMRS clinical data, intended as shared infrastructure for any consumer that needs query patterns core’s transactional schema doesn’t serve well. The visible use cases today are AI/ML pipelines, semantic + keyword chart search, cross-patient cohort queries, and analytics, but the design isn’t tied to them — anything that needs a read-optimized view of clinical data is in scope.
This is early and very much a work in progress — no implementation yet, on purpose. Nothing in the design is locked in: every decision in the ADR is open to revision based on community input. The questions below are where we’d most like feedback first, but they’re not the only ones up for discussion — push back on anything.
Repo (design docs only at this stage):
GitHub - openmrs/openmrs-module-querystore: OpenMRS Query Store Module — CQRS read-side projection of clinical data for AI, analytics, and reporting · GitHub
Architectural decisions: docs/adr.md
The current proposal in one paragraph
Apply CQRS: core remains the source of truth; a second store maintains an eventually-consistent read-side projection optimized for queries. The current candidate for the backing store is Elasticsearch (hybrid BM25 + dense-vector kNN + structured filtering in one system), but the choice is open — if something fits OpenMRS’s deployment realities better, the design should accommodate it. Data lives in per-type indices under an openmrs_* namespace. Each document has three parts: a plain-text serialization of the record, a vector embedding generated from that text, and structured metadata for filtering. Sync is events first, with AOP only as a scoped gap-filler. The v1 consumer surface is a Java service using OpenMRS’s standard @Authorized privilege annotations; REST and FHIR layers are deferred from v1 but explicitly additive — they layer on top of the same service when we get to them.
The most concrete near-term consumer is the existing chartsearchai module, which today maintains its own ES pipeline. A migration analysis identifies what would need to change on each side.
What we’d most like feedback on
- Module or core? The current draft sits as a module on the grounds that not every deployment needs it, the backing-store choice shouldn’t bind core, and search/analytics dependencies don’t belong in every deployment. But the counter-argument is real — if this becomes the standard read surface across consumers, a module makes it optional infrastructure each deployment must add. Should this live in core instead, or stay a module?
- Is CQRS the right framing at all? It adds a second system and eventual consistency. Worth it, or should we push harder on the transactional database?
- What should the backing store be? Elasticsearch is the current candidate (mature, hybrid keyword + vector + structured filtering in one system, already in use by some deployments), but the ~1–2 GB memory floor is non-trivial in low-resource settings. Alternatives worth considering: PostgreSQL + pgvector, OpenSearch, a dedicated vector DB paired with a separate keyword index, or something else entirely. What would fit your deployments better?
- Plain-text serialization over FHIR. Chosen for token efficiency and embedding quality. Right call, or should we offer a FHIR projection too?
- Coarse
GET_PATIENTSauthorization in v1. Cross-patient results may include patients the caller couldn’t read individually via core —dataFilterand location-based ACLs are not honored. Acceptable starting point, or deal-breaker? - Sync reliability. Durable subscription, dead-letter handling, reconciliation against drift — anyone who’s built event-driven sync on top of the OpenMRS Event module: what bit you?
The ADR’s Open Questions section lists items we haven’t formed a view on yet (patient merge, bootstrap, long-text chunking, embedding model versioning, complex obs, PII scopes, concept-set queries, time-zone convention, Person vs Patient model). Input on any of them is welcome — and accepted decisions in the ADR are equally open to challenge.
Concerns about the premise (“do we even want this?”) are explicitly welcome too.