Document Metadata model for OpenMRS over FHIR

angshuonline · October 28, 2025, 11:05am

Hello All,

We have a requirement to upload and store documents (PDFs, scans etc) for patients/persons. These documents will be clinical most of the times, but may also have non-clinical documents like Insurance details. Patients often bring past documents before procedures which need to be uploaded (old diagnostic report, vaccination details, or sometimes even a clinical note etc) - meaning these documents didn’t origin at the hospital/clinic of service, and potentially will not have references (practitioner, location etc) in the OMRS.

We are considering leveraging a model for storing the metadata for such documents in FHIR Document Reference model. OpenMRS platform, does not have a structured way to record and manage documents with metadata, so we believe it will be beneficial for all if we can agree on a common model and source.

Below is a suggested DB model for the same.

Note:

document_reference.person_id - reference is to a person, not patient. We want to keep this option open.
document_reference.master_identifier - this has been removed in R6. But this is an important document attribute (e.g. insurance policy no) for subsequent searches. When we move to R6, I think we can merge this with identifiers list for some “system”.
document_reference.date_created - will translate to DocumentReference.date and document_reference.date_started and document_reference.date_ended will form the basis for “DocumentReference.period” (not in R4, but present in R6 - and for now, we would this as extension)
document_reference.order_id - links to the order (service request) that may have resulted this document. (e.g. a Mental Health Assessment Report - classified as super private). This maps to FHIR R6 DocumentReference.basedOn which isnt there in R4. So we are going to use extension for this.
document_reference.encounter_id - links to FHIR DocumentReference.context.encounter. The structure has changed in R6, where its a simple reference. Anyways, we are suggesting only encounter reference, so this should be absolutely fine.

The content is separated out to different table. Maybe there is a case, where documents can be in different format and content type. So suggests FHIR as well.

document_reference_content.content_type stores the mimeType (e.g. application/pdf, image/jpeg etc).
document_reference_content.content_url - stores the actual reference URL where the document is physically stored.

There are certain other attributes we may require and I don’t know if an Attribute Type model for DocumentReference is preferrable. e.g. issuing authority - some may relate that to FHIR DocumentReference.custodian, but often the EMR system would not have the such referential relations setup, and just capturing the “name” of the issuer is enough within the clinic/hospital.

Please let us know your thoughts/feedbacks and ideas.

erDiagram
    document_reference ||--o{ document_reference_content : "has"
    document_reference }o--o| concept : "type_concept"
    document_reference }o--o| concept : "security_concept"
    document_reference }o--o| provider : "author"
    document_reference }o--o| location : "location"
    document_reference }o--o| encounter : "encounter"
    document_reference }o--o| orders : "order"
    document_reference }o--o| person : "person"
    document_reference }o--|| users : "creator"
    document_reference }o--o| users : "changed_by"
    document_reference_content }o--|| users : "creator"
    document_reference_content }o--o| users : "changed_by"
    document_reference_content }o--o| users : "voided_by"

    document_reference {
        int id PK "auto_increment"
        char(38) uuid UK "not null"
        varchar(100) master_identifier
        varchar(50) status "not null"
        varchar(50) doc_status
        int type_concept_id FK
        datetime date_started
        datetime date_ended
        int author_id FK
        int location_id FK
        text description
        int security_concept_id FK
        int encounter_id FK
        int person_id FK
        int order_id FK
        int creator FK "not null"
        datetime date_created "not null"
        int changed_by FK
        datetime date_changed
    }

    document_reference_content {
        int id PK "auto_increment"
        int document_reference_id FK "not null"
        varchar(255) content_type
        varchar(512) content_url
        boolean voided "default false, not null"
        int creator FK "not null"
        datetime date_created "not null"
        int changed_by FK
        datetime date_changed
        int voided_by FK
        datetime date_voided
        varchar(255) void_reason
        char(38) uuid UK "not null"
    }

    concept {
        int concept_id PK
    }

    provider {
        int provider_id PK
    }

    location {
        int location_id PK
    }

    encounter {
        int encounter_id PK
    }

    orders {
        int order_id PK
    }

    person {
        int person_id PK
    }

    users {
        int user_id PK
    }

@ibacher @dkayiwa @burke @mksd @mohant @akhilmalhotra @grace

angshuonline · October 28, 2025, 3:37pm

Update:

document_reference.subject_id must refer to patient for now. As otherwise, it causes conflicts as encounter is only mapped to patient in openmrs. Although this is doable, right now we are taking the easy way out. the field will be subject_id

erDiagram
    document_reference ||--o{ document_reference_content : "has"
    document_reference }o--o| concept : "type_concept"
    document_reference }o--o| concept : "security_concept"
    document_reference }o--o| provider : "author"
    document_reference }o--o| location : "location"
    document_reference }o--o| encounter : "encounter"
    document_reference }o--o| orders : "order"
    document_reference }o--o| patient : "patient"
    document_reference }o--|| users : "creator"
    document_reference }o--o| users : "changed_by"
    document_reference_content }o--|| users : "creator"
    document_reference_content }o--o| users : "changed_by"
    document_reference_content }o--o| users : "voided_by"

    document_reference {
        int id PK "auto_increment"
        char(38) uuid UK "not null"
        varchar(100) master_identifier
        varchar(50) status "not null"
        varchar(50) doc_status
        int type_concept_id FK
        datetime date_started
        datetime date_ended
        int author_id FK
        int location_id FK
        text description
        int security_concept_id FK
        int encounter_id FK
        int subject_id FK
        int order_id FK
        int creator FK "not null"
        datetime date_created "not null"
        int changed_by FK
        datetime date_changed
    }

    document_reference_content {
        int id PK "auto_increment"
        int document_reference_id FK "not null"
        varchar(255) content_type
        varchar(512) content_url
        boolean voided "default false, not null"
        int creator FK "not null"
        datetime date_created "not null"
        int changed_by FK
        datetime date_changed
        int voided_by FK
        datetime date_voided
        varchar(255) void_reason
        char(38) uuid UK "not null"
    }

    concept {
        int concept_id PK
    }

    provider {
        int provider_id PK
    }

    location {
        int location_id PK
    }

    encounter {
        int encounter_id PK
    }

    orders {
        int order_id PK
    }

    patient {
        int patient_id PK
    }

    users {
        int user_id PK
    }

Also note, FHIR DocumentReference has authors[ ] that can be practitioner and organization.

document_reference.author_id refers to FHIR DocumentReference.author, where author is of type practitioner.
document_reference.location_id refers to FHIR DocumentReference.author, where author is Organization. We propose that we identify locations that are tagged as Organization. However, its often the case that such organizations are not known to the EMR in advance, but they still need to capture minimum details of the organization (name for example)
- with an additional field document_reference.external_org_name column - mapped to an extension.
- or capture as attribute in document_reference_attribute mapped to extensions. Attributes does bring additional metadata capability to the database models.

Please share your thoughts and ideas.

dkayiwa · October 28, 2025, 5:30pm

I am very sure you must have looked at the attachments module. But just for the sake of completeness, do you mind also sharing the problems that you found with it? Some sort of pros and cons.

angshuonline · October 28, 2025, 6:35pm

I am guessing you already know why

Mixed with obs - attachments omod saves as obs. While it does provide some context - it does not provide ability to model additional metadata. like author, organization, context period, security.
- Note, we use the same in Bahmni for complex data: and querying for documents is not easy at all to query, list, search, or apply contextual access or not - and never mind the storage itself.
Scalability - Lets stop abusing “obs” for everything and create custom data structures around it. /v1/Attachments/ has its purpose of course, and it has served well - imo, still appropriate in contexts, where performance, security, querying and complexity are not the primary concerns. It comes straight out of the box, and often just filesystem and configuration as obs (especially in form entry) is enough to get started.

Imho,

separation of concern - metadata about a document is different from the content. You would often query for the metadata and retrieve actual data only when really needed. Also related is access - you may get to know metadata info but not necessarily about the actual content. These are all separate concerns and best managed separately.
- storage - especially on cloud, but even on on-premise, the storage means can be abstracted out to separate sub-system. S3, Blob Storages, MinIO provide such means - with safely, fault tolerant ways. I am aware of @raff work towards StorageService and I think such are the right steps.
- The need for querying in contexts are way more common over actual retrieval of the content. And a separate metadata model is useful for such cases. For example, in the above model, we can easily accomodate Attachment as just adding it as URL in

document_reference_content.content_url = “/v1/attachment/uuidofObs”

- access privilege - is again separate concern and must not get mixed with just @Authorized(AttachmentsConstants.VIEW_ATTACHMENTS) alone. Depending on the document type and context, access rights vary.

In the proposed model, we are only talking about FHIR DocumentReference (resource) and not the Attachment (datatype). One may use /v1/Attachment REST API in combination effectively.

dkayiwa · October 28, 2025, 7:31pm

Makes lots of sense.

I like the ability to extend via attributes.

Are you planning to add this functionality via a module?

angshuonline · October 29, 2025, 9:20am

I was wondering if that should be a separate module. Right now, we plan to keep it in the Bahmni’s FHIR extension module. But we can easily curve out a separate module for it later with very little effort. (just being lazy, to setup all build, packaging, distribution assembly etc for now)

Even if we don’t expose all REST APIs for attributes for CRUD operations, the minimum we will need is the GET /document-reference-attribute-types. In a separate omod, I can keep both the APIs (OpenMRS REST and FHIR REST), but a FHIR specific omod, thats not the best.

Whats your opinion?

dkayiwa · October 29, 2025, 1:33pm

Oh yes, lazy can be a good attribute. It leads you to automate things, avoid over designing, or doing more than you will eventually need. So i agree that starting with the simpler option of going with the bahmni fhir extension module is good.

Shouldn’t the FHIR REST API only, be enough? We are generally trying, as much as we can, to move away from OpenMRS REST API to FHIR REST API.

angshuonline · October 29, 2025, 4:41pm

Well, the right way is through a published structure definition of the extensions for document reference through an IG.

We define attributes for DocumentReference (or say any other attributable resource in OpenMRS like Location). These attributes are often custom and varies from implementation to implementation. So an extension like DocumentReference.externalOrgName would need to be defined if we are to accept submissions or return as below.

{
       "resourceType": "DocumenReference",
       .... 
       "extension": [{
		  "url": "http://fhir.example.org/ext/document-reference/attributes#external-org-name",
		   "valueString":  "Example Lab"
	   }]
}

Notice the attribute name in the URL. In OpenMRS attributes can have spaces in there, so resolution to an attribute is little tricky, but maybe on the backend, we can provide an hook for implementers to define that programmatically through their own modules, or a simple json map file somewhere. The main problem is not really on the resource representation, but how do each implementation describe to any client that these are the possible attributes (and that too dynamically).

Typical way to deal with defined declared extension, is to publish structure definition of the extension for the DocumentReference. Ideally, for each attribute, structure definition should define so. Now whether the implementations would do so is another matter. For larger organizations, looking seriously at FHIR based integration across the ecosystem, maybe they would do so. But I don’t see the possibilities of smaller setups, mostly concentrating getting their system operational within their limited context, doing so. Here’s where implementers keep reinventing and customizing things. Question is how do we help them?
For the above class of implementers or the UX or client system to discover such extension, they must interpret the StructureDefinition of the DocumentReference. In typical REST api, the client (UX or API) would first get attribute definitions, and interpret the datatype, name etc

/v1/locationattributetype

Now, as a replacement, its possible to return the structure definitions from a running server, but those would be an undeclared extensions. We could introduce an API to return the structure-definition of the DocumentReference dynamically when called. example

{
  "resourceType": "StructureDefinition",
  "id": "external-org-name",
  "url": "http://your-org.org/fhir/StructureDefinition/external-org-name",
  "version": "1.0.0",
  "name": "Name of the attribute",
  "title": "Description of the attribute",
  "status": "active",
  "publisher": "Your Organization",
   .... 
  "fhirVersion": "4.0.1", 
  "kind": "complex-type",
  "abstract": false,
  "context": [
    {
      "type": "element",
      "expression": "DocumentReference"
    }
  ],
  "type": "Extension",
  "baseDefinition": "http://hl7.org/fhir/StructureDefinition/Extension",
  "derivation": "constraint",
  "differential": {
    "element": [
      {
        "id": "Extension",
        "path": "Extension",
        "min": 0,
        "max": "1"
      },
      {
        "id": "Extension.extension",
        "path": "Extension.extension",
        "max": "0",
        "short": "Must not have children extensions" 
      },
      {
        "id": "Extension.url",
        "path": "Extension.url",
        "fixedUri": "http://your-org.org/fhir/StructureDefinition/external-org-name"
      },
      {
        "id": "Extension.value[x]",
        "path": "Extension.value[x]",
        "min": 1,
        "type": [
          {
            "code": "string" 
          }
        ]
      }
    ]
  }
}

Hope the above makes sense. Please share if you think there’s easier and/or alternate ways of solving this.

As you see, more work in getting introduced here ( like Archimedes said - Give me a lever long enough and a fulcrum on which to place it, and I shall move the world) - we all have constraints.

angshuonline · October 29, 2025, 4:54pm

For now, I am inclined to go with a convention and programming means really (and not introduce any API to define REST or StructureDefinition for the attributes)

inspect submitted DocumentReference for extension http://fhir.example.org/ext/document-reference/attributes#external-org-name
lookup external-org-name either from attribute list, or from a json, or through an interface implementation registers. example Map<String, String> DocumentReferenceAttributeInterpreter.extensionAttributeMap()
The above still does not solve the discovery, machine readability and interpretation by a client (UX/API) dynamically though. So for now, we assume the client knows about these attributes and throw error if the backend does not understand these attributes.