Hello everyone,
This is the proposed implementation for my GSoC 2026 project - Archiving voided data. I am posting this here for community feedback, as suggested by my mentors @dkayiwa and @jonathan. I’ll be taking the obs table as an example.
This approach creates an obs_archive table identical to obs (without inbound foreign keys to prevent locking). It moves voided data to obs_archive the moment the voided bit is flipped. So here, our archive table is synonymous to the void column.
Implementation Plan:
-
One-Time Legacy Migration: Before activating the trigger, a migration script moves all existing rows with
voided = 1fromobstoobs_archiveand (in the end) hard-deletion will be done from the primary table to clear historical bloat. -
Change Foreign Key Constraints:
- Drop the existing FK on
obs.previous_version → obs.obs_id. - Add new FK:
obs.previous_version → obs_archive.obs_id(active obs’s previous version is pointing to voided/archived). And for the archived obs, the previous version would reference the archive table.
- Drop the existing FK on
-
Move-and-Delete Mechanism: Implement via a database
AFTER UPDATEtrigger or a HibernateonFlushDirtyinterceptor. When thevoidedbit is flipped from 0 to 1, the system executes an atomic transaction:-
Copy: Insert the row into
obs_archive. -
Hard Delete: Immediately physically delete the row from the primary
obstable.Note: Archiving a parent obs would archive all the child obs too (just like the current implementation of voiding).
-
-
Restore Strategy: Restoration is handled by a Spring service that performs a reversal - pulling the record from
obs_archiveback intoobsand resetting thevoidedbit to0. Also re-establishing the FK reference to the activeobstable (forprevious_version) -
Future proofing - Purge: A Date-time partitioning could be done on the archive table so the future implementation for dropping the records (say 10 or 15 year old records) - if required, is easy.
Questions for the community:
Should the data only be archived after ‘x’ days or months? As opposed to doing it instantaneously as proposed?
If yes,
- We could use the existing JobRunr infrastructure to implement a recurring job that queries for
voided = 1within date range adjusted to the grace period and invoke the Archival service. - We could wire the grace period, job’s schedule interval and batch size to Global Properties for admin configurability.
Any feedback on the implementation is greatly appreciated. Thank you.




