Hi all,
Folks at AMPATH (JJ and Allan) and Antara are going to tackle something along these lines in the next 90 days. We had the following thoughts:
- We will design a friendly-to-analytics schema for the data that is NOT the schema of the forms (1 option in the thread), but something more intentional for analytics, but not specific to an implementation. 2. In our brainstorm, a sparse 4D data frame that is roughly Person x Concepts x Time x Location — as a strawman starting point
- We’d love your input to design such a data frame
- There are existing and purpose built mechanisms for populating the dataframe, whether via openmrs-rest-api, or database query instrumented with Airflow, or one of the stream engines like Kafka (flink or nifi or ?)
- Maybe if there’s enough interest, we can create a contest from various members of community, each taking a different POC approach, and we run it on a large data set (i.e. AMPATHs database?) and decide by looking at the performance characteristics
- On the other side of this data frame definition, we’d like to create a set of convenience functions for aggregating the Obs-dataframe over time, locations, cohorts of patients and groups of concepts — maybe this is presented as a Jupyter/Spark notebook
Curious your thoughts.