I’ve been thinking about how report generation performance in large OpenMRS implementations would perform and wanted to get some input before exploring this further.
For complex cohort reports for example “HIV-positive women aged 15–49 who attended ANC this month and are on ART” the Reporting Module evaluates each sub-query sequentially. On datasets of 50,000+ patients, this can mean 30–60 second waits and occasional timeouts.
As far as I can tell, the current Reporting Module does not offer parallel processing for independent cohort sub-queries, even when those queries have no dependency on each other, they still run one after another(Correct me if am wrong). Would adding that capability meaningfully reduce report generation time on large datasets?
Something I’m exploring, fire independent sub-queries simultaneously and combine the results at the end, while dependent queries still wait for their parent results as today. This could sit entirely above the existing EvaluationService without touching any current interfaces or report definitions.
I want understand though:
- Is sequential evaluation actually the bottleneck, or does the pain come from elsewhere (slow SQL, DB indexing, memory)?
- Any concerns about introducing parallelism here things like connection pooling, thread safety, anything else?
- Are there implementations that have already worked around this somehow?
Just being curious, checking if this is a real problem worth solving.