Data Migration & ETL Module

Hello People i have been browsing through unassigned projects and was wondering 1:how does one begin working on a problem of interest…Special interest is here .

2: Do they have to be GSoC students

am also wondering if esaude team is working on the same problem here

Hello team. I would like to do something related to this task though limited to ETL. I envision generating ETL tables from html forms in htmlformentry module. I currently use solutions comprising a set of stored procedures to create and manage ETL tables and this has improved report development and generation time by > 95 %.

I would categorically request add ins from experienced devs (@dkayiwa, @mseaton, @wyclif, @darius) just to name a few.

Below is my story:

I am Antony Ojwang’ from Kenya and currently undertaking MSC. Applied Computing (Health Informatics) at the University of Nairobi. I have prior experience working with OpenMRS as a developer at different organizations including AMPATH, I-TECH Kenya, KEMRI-RCTP-FACES and currently at UCSF Global Programs for Research and Training. I have developed a number of OpenMRS based products mainly revolving around reporting. I am in the process of identifying my MSC project and am passionate to design and develop a ETL for OpenMRS based on html forms. This will ensure every implementation of OpenMRS will have basic flat tables from which they can easily report and build on in case of need for other derived columns/tables.

My choice of project has been informed by difficulties I have personally experienced while querying data in EAV data model and I occasionally developed flat tables to help simplify queries and also improve query execution time.

My solution will therefore extend htmlformentry module which will translate htmlforms’ definitions into flat tables and consequently manage the tables with data from EAV data model. I know this will be challenging but I feel I have the motivation and all that it takes to have it done under the guidance of the community. Please help me know on the following about the project:

  1. Suitability and feasibility of the project
  2. Suitable approaches/workflows
  3. Possible contacts to help with more ideas and design

Thanks and I look forward to your feedback.

1 Like

Thanks @aojwang for the initiative! Do you have any reason for basing this on a form entry technology instead of the obs table, where data is ultimately stored regardless of which form entry technology was used?

@dkayiwa thanks so much for the prompt response. I only thought form entry could help being that it is what is used currently. What I would like is a generic way to read data out of obs table into flat table and I would really appreciate any suggestions on this.

@aojwang defintely interested in anything you might come up with… having a way to simplify the EAV model into something easier to query has been something we have been working on as well.

You might be interested in checking out some recent exploratory work we have done in using Pentaho to set up a pipeline to flatten the OpenMRS data structure as a first step and then convert it into a star schema. This is definitely still a work in progress, but you can check it out here:

Also, although I agree with Daniel that it’s worth thinking about the value of limiting your solution to data entered with a specific form entry technology, the idea of using htmlforms to help with the flattening is intriguing, I would be interested in hearing more.

Take care, Mark

Hi @mogoodrich. This was very helpful and good starting point for me. As of now, I only have the idea of using htmtforms’ definition to create a corresponding flat table and also manage (insert/update) with data in EAV model. I will go through the pentaho work for more insight.

Who knows of academic research papers where OpenMRS’ EAV problems is mentioned?

@mogoodrich i will keep an eye on your work and will also share what I shall have done once I properly map the idea.

Thanks, Antony