Analytics on FHIR

bashir · June 18, 2020, 7:20pm

Hi everyone,

There is an interest in building a shared/unified solution for the ETL/Analytics needs of OpenMRS (e.g., see this post by @burke, the thread following it, and the two-meetings following it).

As some of you know, I have been looking into building this solution on top of FHIR. To get some feedback, I wrote this doc and shared it with some folks for early feedback last week. Now, I would like to hear from everyone. I feel the doc format is easier to comment on minor details but please feel free to use this thread for more high-level discussions.

Here is a copy of the list of pros/cons as I understand them; this is not a replacement for reading everyone’s comments and other details in the doc (so please do read that doc if you are interested in this topic):

Pros:

The main benefit of using FHIR for analytics is standardization. This makes it easier to integrate OpenMRS data with other systems that can speak FHIR.
Another side effect of standardization is that data scientists do not need to understand the OpenMRS data model. In general, to work on OpenMRS analytics, one only needs to understand FHIR, which is well documented.
Again as a standardization side effect, analytics tools developed to work off of FHIR can be applied to OpenMRS analytics workloads without too much effort [with caveats].
To be able to do analytics on FHIR, we need to develop pipelines that translate OpenMRS changes to FHIR resources (both in batch and streaming modes). These pipelines are useful beyond the analytics use cases. For example, if we want to export OpenMRS data to a FHIR store (e.g., for a Shared Health Record system), we can leverage the same pipelines and mechanists for Analytics on FHIR.

Cons:

The main disadvantage is more complexity. It is true that FHIR is well documented but still it is a huge standard and OpenMRS only uses a tiny portion of it.
Another angle to the complexity issue is more complex queries because of the presence of ARRAY and STRUCT column types.
There are already analytics solutions in the OpenMRS community and none of them are based on FHIR (AFAIK). Note that this is not completely a disadvantage for using FHIR. Because our goal is to develop a unified solution for OpenMRS Analytics and we need to unify those custom schemas. The benefit of FHIR is that we simply rely on a standard as the unifying schema.
There is definitely an extra overhead to convert OpenMRS data model into FHIR. This can specially make the batch pipeline for exporting OpenMRS data to the data warehouse more expensive.

Adding a few people who have commented or showed interest in this work before (as an FYI): @akimaina, @aojwang, @burke, @ccwhite23, @dkayiwa, @grace, @ibacher, @jennifer, @mksd, @mseaton, @pmanko, @wyclif

burke · June 22, 2020, 2:18pm

Thank you, @bashir, for your thoughtful write-up.

I think there’s a fundamental question of whether we are enforcing FHIR or encouraging/promoting FHIR in the warehouse.

FHIR-only Data Warehouse. Any data placed in the warehouse must first be mapped to FHIR. The warehouse runs scripts/tooling designed to work with FHIR data.
FHIR-focused Data Warehouse. One of the first steps performed by the warehouse is to transform data into FHIR; any data received in FHIR format can bypass this step. The warehouse is agnostic to the data format. We focus our efforts on FHIR-based tooling, but the warehouse can accommodate scripts/tooling that uses other data formats.

I agree with the notion promoting FHIR and focusing our efforts on leveraging the standard, since the number of tools that can work with FHIR data will only grow over time; however, I favor a FHIR-focused approach where we work to pave the road for FHIR data, but don’t assume all data will be FHIR-formatted. This both allows for wider adoption of the warehouse tooling and can provide a mechanism for data outside the scope of FHIR to leverage the data warehouse.

bashir · June 24, 2020, 2:19am

Thanks @burke for the notes (here and in the doc). The core of this proposal is to make the FHIR based schema the foundation of any data warehouse solution; of course other views and easier to work with formats can be made off of that FHIR based schema.

I guess we can make it a “FHIR-focused” approach by allowing non-FHIR data to be present and queryable along with the FHIR schema. I wonder, are there specific examples that you see, where required OpenMRS data cannot be represented in FHIR format?

bashir · June 24, 2020, 5:37am

Just replied to this doc comment made by @burke which is relevant to this discussion re. FHIR-only vs FHIR-focused data warehouse.

TL;DR; Yes we should be able to support non-FHIR data in the warehouse but there is still some development cost to ETL that data even in the non-FHIR case.