Question about archiving design for voided observations (TRUNK-6619)

Hi everyone,

I’m currently working on improving the archiving workflow for voided observations (TRUNK-6619).

While implementing idempotent archiving, I explored using a separate table (obs_archive) along with safeguards like NOT EXISTS to prevent duplicate entries and ensure safe deletion in case of partial failures.

Before proceeding further, I wanted to ask:

What is the recommended approach in OpenMRS for handling archiving of data like this?

Should a new archive table (e.g., obs_archive) be introduced via Liquibase, or is there an existing pattern I should align with?

I want to ensure the design is consistent with existing practices in the project before making further changes.

Thanks in advance!

Hi @varshithreddy , thanks for raising this. This is what I think.

On the obs_archive table approach:

Your instinct is well-aligned with community thinking. A separate archive table is the right structural approach, since obs is the primary pressure point — voided observations accumulate naturally due to OpenMRS’s correction lifecycle (new Obs replace old ones rather than updating in place).

On idempotency and safe deletion:

The NOT EXISTS guard is a solid pattern. The community has also explored fully transactional batches with a “retry and isolate” fallback. Large batches are attempted first, and on failure, retried in smaller micro-batches to isolate problematic rows without blocking the whole process. On relational integrity: I think this is the most critical piece. The obs table has several relationships to respect, like group hierarchies (obs_group_id), version chains (previous_version), and external anchors (encounter_id, person_id, and order_id). A bottom-up strategy is what I would recommend: archive leaf obs first (no children, not referenced as previous_version by an active obs), then work up the tree.

On Liquibase:

Yes, I think introducing obs_archive via Liquibase is consistent with how all schema changes are managed in openmrs-core. Ensure the changeset is idempotent and properly tagged.

1 Like

Thanks a lot for the detailed explanation, this really helps.

The point about relational integrity and archiving leaf observations first makes a lot of sense. I had not considered the impact of obs_group_id and previous_version chains in the current approach.

I will revisit the implementation to incorporate a bottom-up strategy and also proceed with introducing the obs_archive table via Liquibase.

Thanks again for the guidance!

Hi @varshithreddy, I think the topic which you raised and started discussion off is also an idea of GSoC 2026 that is the archiving voided data project where actually we need to make a separate table of (obs_archive), which you also said off, so introducing it in a PR for now is not required as this issue can be solved after that topic for project is resolved and everything is planned off as after it is done then the table will be present for all developers and therefore one can easily implement and extend that table where it is required accordingly as the idea which you came up with is somehow related to or a part of GSoC 2026 project idea list.

Hi @binayak, thanks for the insight, that makes a lot of sense.

I understand that introducing the obs_archive table is part of a larger planned effort (like the GSoC archiving project), so it’s better not to include schema changes in this PR right now.

I’ll keep this PR focused on improving the safety and idempotency of the archiving logic, and it can later integrate with the archive table once the overall design is finalized.

Appreciate the clarification!

Yes, Thanks @varshithreddy for understanding.