Excel Data Import and Export Feature for OpenMRS — Request for Technical Feedback

Hi everyone :waving_hand:,

I’m working on an Excel import and export feature for OpenMRS. I’m currently finalizing the requirements and wanted to bring the approach to the community especially to get technical feedback, architectural guidance, and possible alignment with existing data-exchange patterns.

Overview

This work targets implementations that frequently collect structured data offline (for example, outreach programs or facility sites with intermittent connectivity). The idea is to allow offline data entry in Excel, then upload and validate those records directly into OpenMRS.

We’re breaking this into two key components:

1. Excel Template Generation

The module should generate an Excel workbook based on selected O3 form definitions / schemas where each form becomes a worksheet (e.g., Patient Registration, Mental Health Assessment, SOAP Note, etc). The generation process should read O3’s JSON schema (stored in clob_datatype_storage) and converts it to an Excel structure.

Answers rendered as radio/select should be turned into Excel dropdowns using Apache POI’s DataValidation API. Each sheet should include data-type formatting (date, numeric, coded), column metadata, and optional hidden validation logic.

Example: O3_Form_Spreadsheet - Google Sheets

2. Excel Import and Validation

After data capture, the user should be able to upload the same workbook into OpenMRS. The backend module should validate the workbook structure , run a dry-run preview, and then persist valid records as encounters and observations. Invalid or incomplete rows are should automatically quarantined into a validation report for review and correction.

Data import must balance flexibility and data quality. Excel data entry requires relaxed validation rules while still identifying and isolating non-processable rows (missing identifiers, invalid formats, or unrecognized patients). Too much in-Excel constraint (e.g., dropdowns) can hurt usability, especially in low-resource contexts. Metadata such as location, user, and import timestamp will be captured at upload for better traceability.

Patient matching should prioritize human-readable identifiers over UUIDs, avoiding over-aggressive deduplication and relying on existing patient merge tools to handle duplicates when necessary. (excerpt from my earlier discussion with @ibacher )

Architecture Overview


User Flow Overview


Eager to hear thoughts, guidance, and any suggestions on improving the architecture and direction Thank you .

cc: @grace @burke @dkayiwa @ibacher @michaelbontyes @raff @jnsereko @wikumc @ruhanga @samuel34 @ICRC @Madiro @dev5 @dev4 @dev3

4 Likes

Are you planning to submit new patients and then also submit observations for them using other spreadsheets? If yes, how are you planning to link them?

I hadn’t given it much thought but i presume that each sheet will include a column (e.g Patient Ref) that references a patient and links back to the patient’s record created in the patient registration sheet. When the workbook is uploaded, the importer first creates or matches patients, then uses the column to attach each encounter (and its associated Obs) to the correct patient ensuring all data from multiple sheets remains consistently linked to that one patient.

So if there’s a value in the Patient Ref column (or w/e) that doesn’t match a known patient is that handled as an error or new registration?

If a Patient Ref in any sheet points to a value that doesn’t exist in the Patient Registration sheet (and therefore can’t be resolved to the active patient in that workbook), we will flag it during validation and quarantine the affected rows.

Those records won’t be imported until the reference is corrected. This ensures that no orphaned observations or encounters are created without a valid patient link, while still allowing the rest of the workbook to proceed with import.

1 Like

How would you handle form questions that allow multiple answers, e.g., for diagnoses or similar? Or is that out-of-scope for now?

That’s not something i have explored yet but i should.

Is there a user facing workflow for failed imports ie. a UI that let’s you fix the corrupt patient ref?

1 Like

An in-app correction view that allows users to fix errors directly within the UI after a dry-run before final submission is something that i can look to add in the long run but for now users will see a summary report highlighting problematic rows (e.g., missing or invalid patient references).

Is the “Import Ok” check intended to be per-row or for the whole document? I’m wondering because it has implications for what happens if, say, I fill in 100 rows and 60 are fine but 40 have issues. I.e., if the 60 that were fine were already processed, they might be double-processed when I re-upload the sheet having fixed the 40 issues.

I had thought about it more as a check for the whole document, but I see what you mean. It makes much more sense to treat the import as per-row instead that way, successfully imported rows can be safely skipped on re-uploads, and only failed ones get reprocessed.

This reminds me about some bits of the formentry module where we used msinfopath for data entry with a template for a patient downloaded, which contains some pre-filled data and then submitted to a folder on the file system for later processing by a scheduled task.

1 Like

Definitely, the difference here is that we are not downloading the Excel template for existing patients. I think it’s a nice-to-have feature, especially for workflows like HIV clinics where clinicians might want to download templates pre-filled with patient data for scheduled appointments (e.g., for viral load follow-up).

The current focus, however, is on generating blank but structured templates derived from O3 forms for bulk data entry mainly for new patient registration and encounter capture. Later iterations could extend this to support pre-filled, patient-specific templates similar to the old InfoPath-based FormEntry workflow, but with Excel + POI and a web upload process instead of the file system watcher.

Given that each workbook represents a single patient’s data, it’s worth considering whether we need a Patient Ref column on any sheet other than the one containing patient details. Otherwise it may be redundant elsewhere.

I’d lean toward a per-sheet approach, since that better mirrors standard form validation logic. If one part of a form is invalid, typically the entire form is blocked from saving. This model may be more intuitive and could also help minimize complications around undoing persisted changes or voiding partial data.

1 Like

I think the idea here is that a single sheet in a single workbook can be used to capture multiple patients data with each sheet being ~a form and each row being the data that would’ve been submitted by via that form.

2 Likes

Ahh, right. Makes sense then to import and validate per row as suggested earlier.

1 Like