Sql dump based on Velocity renderer of openmrs reporting module

bharatak · March 30, 2016, 6:13pm

Hi, for one of our implementations there is a requirement of generating a full data export of all the forms,patients, drug details in the system. We are planning to use openmrs-reporting module for this use case. We are planning to create one dataset per form using various standard datasetdefinitions. The output of the export should be a .sql file which contains a de-normalised schema (visualising it to be one table per dataset). I am thinking of a custom renderer based on velocity template which can generate commands like “create database”, “create table”, "insert into " . I would like to validate my approach and also would like to know if we have a readymade template that someone has already created for this use case. Thanks a lot.

judy · March 31, 2016, 4:17am

Interested in such a solution

bharatak · March 31, 2016, 4:56am

tagging some openmrs reporting module devs … @mseaton @mogoodrich @darius

vinkesh · March 31, 2016, 5:32am

Can we rather create a temp table and use mysqldump to create .sql file?

maurya · March 31, 2016, 5:56am

If the requirement is to share the data between two openmrs implementations, I would recommend taking a look at and using/developing Metadata Sharing module. @raff might be able to guide you better on this module.

bharatak · March 31, 2016, 6:32am

I’ll try to explain the requirement a bit clearly. One of our customers wants the dump of all data filled at a site in a simpler de-normalized database format. (for example one form per table).

For example, if there is a form contains Blood Pressure > Systolic (120), Diastolic (80), they are stored as Obs in OpenMRS. What is expected in the output schema is a “Blood Pressure” table with Systolic & Diastolic as columns and the values of 120 & 80 as row (might be per encounter). The other columns in this table could be PatientID, PersonAttribute Details etc

One way, this can be done is by using openmrs-module-reporting which supports the concept of ReportDefinition. One ReportDefinition can contain multiple DataSetDefinitions. This module supports ReportRenderers (like Excel,CSV,TSV etc) which can be used to publish the datasets. This is one example of FullDataExport done for Mirebalais implementation by PIH.

My question here is about a custom renderer written using VelocityTemplateEngine which can output a SQL file in a flat structure (re-iterating that it is not openmrs schema that is expected as output). Hope this clarifies.

darius · March 31, 2016, 12:57pm

@bharatak, I broadly like this idea. And I agree that it makes sense to write a ReportRenderer in the reporting module to generate these SQL create/insert statements based on all the DataSets in a ReportData object. I can imagine others wanting to use this feature also.

One reason you might not want to use the VelocityTemplateEngine you linked to is that it renders everything to a String and returns it, but I would presume that the SQL output could get very large, and you’d rather write it in a streaming fashion, which is supported by the ReportRenderer interface’s render(..., OutputStream) method.

Also, I imagine you want the renderer to take a configuration option (maybe this is a RenderingMode, I forget) where it zips the output as you stream it.

mseaton · March 31, 2016, 9:39pm

@bharatak - sorry for the delayed response. I also like this idea. I actually have something very similar written up that I had thought was in a ticket, but I can’t seem to find it. What we had been thinking was a Renderer that actually writes to the database, but I think your idea of producing a SQL file that could then be sourced into the DB right away, or a later point in time, is good.

As far as what has already been written, the existing TextTemplateRenderer would meet the needs that you indicated - it already has built in usage of any installed template engine (which includes Groovy and Velocity out of the box) and has access to all of the report data, data set names and column names, and a variety of utility methods.

As @darius indicates though, if this doesn’t perform and you need a more streaming-type solution, I would think you could accomplish this with a custom ReportRenderer that streams the output as it iterates through the dataset, and simply uses the names associated with each data set in the report as the table names, and the names of each column within each data set as the column names. If you need to configure these differently, that could easily be accomplished via a ReportDesign that the Renderer supports.

Let me know if I can help support this work in any way, or if you’d like to discuss in more detail.

Mike

bharatak · April 1, 2016, 4:38pm

Thanks @mseaton and @darius. Will keep you posted on updates.