Background
Many OpenMRS distributions are now resorting to the use of flat tables for reporting needs. These tables serve the needs for both standard and ad-hoc reporting and are in most cases manually created from raw SQL queries designed and executed by developers/database specialists. The process involves writing queries to create and populate tables and incrementally update the tables with changes in the transactional database, openmrs.
Different distributions, however, design the flat tables differently.
Method 1: Report(s) driven tables with pre-calculated variables - This approach produces tables tailor-made to feed specific reports and will in most cases contain a number of pre-calculated variables. A table is designed to contain variables collected from multiple data entry forms.
This approach has proved to be faster to design and execute and additionally produce ready values for the target report(s). The downside is that it doesnāt readily support ad-hoc report requests since not all data variables make it to the tables.
Method 2: One-table-per-form - This approach produces a table for every configured form in the system with a table column pointing to a data element in a data entry form. A table is, therefore, designed to store raw information from data entry forms.
This approach readily supports ad-hoc reporting since the tables are designed to store all data variables in their raw forms as recorded in the data entry forms. The one-table-per-form also simplifies the architecture of the reporting tables to end-users for any ad-hoc data extraction needs. Flattening of all forms is, however, slow compared to method 1 above.
Method 3: A hybrid of methods 1 and 2 - This will result in a set of tables generated based on method 1 and another set based on method 2.
All the approaches simplify the OpenMRS data model to the end-user and make it easier and faster to extract data and reports.
Exploring an automated design for an out-of-the-box ETL solution for OpenMRS
The manual process of designing and developing ETL queries is time-consuming, tedious, complex, and not scalable, and as a community, there is a need to design an out-of-the-box ETL solution that every distribution can adapt and configure as required. In my thoughts, there are two possible ways through this we can achieve this and they include:
-
Exploring OpenMRSā form technology i.e. HTML form entry and JSON form generators, and use the generated markup to inform the design and development of ETL tables.
-
Decompose the obs table into row-per-encounter/row-per-form through the support of the concept dictionary. This is ideal for distributions that donāt use the standard form technologies for data entry
These are not refined conclusions and still requires a lot and focussed deliberations for a good product design. I did a small prototype on HTML forms in KenyaEMR based on method 1 and the write-up can be accessed here.
Please note that there were deliberations on this topic during the conference in Mozambique and the feedback was amazing.
What do others think about this subject? @burke @ssmusoke @ningosi @jdick @mseaton @mogoodrich @nkimaina @wanyee @patryllus @dkayiwa