Evaluating Kibana for reporting and visualization

raff · February 21, 2017, 12:50pm

Let’s try out Kibana for reporting and visualization of data in OpenMRS Reference Application. Some metrics we could start from:

Number of patients admitted by day on a line graph
Diagnoses frequency pie chart
Number of patients by origin on a map

The steps we need to take:

Setup RA 2.6 (using SDK) with demo data for at least 1k patients with observations using the referencedemodata module as explained here https://wiki.openmrs.org/display/docs/Reference+Application+2.3#ReferenceApplication2.3-Demopatients We’ll need to make sure demo data is generated with the info we’d like to report on so changes in the referencedemodata module may be required.
Create docker config (openmrs-sdk:build-distro)
Add Kibana docker service to docker-compose
Connect Kibana to MySQL docker service
Write SQL queries to extract data from MySQL to Kibana
Setup basic charts

@lluismf, are you okay with that approach? Would you be willing to setup all the parts and create charts?

lluismf · February 21, 2017, 8:45pm

A couple of things:

We have to install ElasticSearch and Logstash too (or the full ELK stack)
Why do we need Docker ? If the goal is to pack a demo in a docker image, I’d leave it for phase 2.

For the moment I’ll start installing the ELK stack

darius · February 21, 2017, 10:07pm

Do we really need Logstash too? I would think that E+K is enough.

You don’t “need” docker, but it’s the easiest way to make this easy for anyone interested in playing around to replicate.

I think this is a really neat idea. But I think we should be a bit more directed in how we approach this.

First, let’s make one more attempt to reach out to implementations on the Implementing list and see if someone wants to provide some real deidentified data, in exchange for getting some analyses coded up. (Randomly-generated data is not going to product pretty charts.)

Also, I think that @jdick and AMPATH may already be putting OpenMRS data into ElasticSearch for analysis, and maybe we could use something they’ve done as a starting point. Also, @mogoodrich and @mseaton were recently experimenting with something along these lines, but I think they went another way.

ElasticSearch + Kibana are a great combination, but the underlying model is to search through indexed documents. It’s quite flexible, and you can aggregate these results and replicate some of what you’d otherwise do with OLAP cubes, you don’t get the same performance and memory usage characteristics. (Specifically, if you were to index all the observations into ES it’s going get huge, and require more resources than OpenMRS itself. That seems impractical for a typical implementation, so from the beginning I would avoid this.)

If I were doing this I would look into this workflow:

triggered by an atom feed, schedule, or some other event mechanism… [implement this part last]
fetch key events in a simplified view (not the pure OpenMRS data model) [ideally produce these via Reporting and Reporting REST, but you could hack it in the first pass]
thus we generate a quick spike at an analytic data model, indexed in ElasticSearch
build Kibana graphs on top of this
then see if you can embed the Kibana graphs in the RA UI via an OWA (or alternately, copy the ES queries that Kibana is doing into an OWA + use some graphing library)

PS- Here’s a completely different direction, which is interesting to a different kind of end-user: Introduction to Exploratory Data Analysis of Bahmni using R | by Karrtik Iyer | Bahmni Blog | Medium

chagara · February 22, 2017, 6:36am

well you need something to receive the logs and process them using GROK. You can try graylog since it lets you send directly to it and it still uses elasticsearch for the indexes. I suggest you use the elastic *beats to collect the logs,metrics or any information.

lluismf · February 22, 2017, 8:20am

According to a colleague, it’s the easiest way to import rows from a relational database into Elastic. Maybe it’s killing flies with guns, I haven’t tried yet. The alternative is to build an ETL process to do the same, but with Logstash you don’t need to code.

I found this article https://www.quora.com/Whats-the-best-way-to-setup-MySQL-to-Elasticsearch-replication About half the answers recommend using LogStash. for the ETL process.

I see, but still we can leave it for later. I have a Windows 7 machine, you don’t want a docker image of that right?

raff · February 22, 2017, 11:46am

I can’t recall us being ever able to get deidentified data from an implementation (yes, let’s make one more attempt!), but we may get them to try out what we accomplish here and have an incentive to collaborate further on that setup. It is why I think it is important to do it in Docker so it can be easily run by others on real data.

There are official docker images for the whole stack at elastic · GitHub with documentation so it shouldn’t be hard.

I’m very excited to hear your stories!

Has ever anyone evaluated how much it is in practice? I figure you can always limit to a year of observations and still find it useful.

@lluismf, how about @adamg will work on creating the docker setup and you take it from there… @adamg, would you be interested in creating docker-compose with RA 2.6 and ELK (full stack)?

adamg · February 22, 2017, 12:21pm

@raff I can try Do you want to do it based on docker-compose generated by SDK ?

raff · February 22, 2017, 1:25pm

@adamg, yes, it would be the most straightforward. You can even write yet another docker-compose file for Kibana to be combined with RA instead of modifying the generated one. Thanks!

lluismf · February 22, 2017, 8:39pm

Tell me stubborn but I prefer to work natively instead of a virtual machine. I will create a logstash .CONF file with the queries and Kibana allows to export its artifacts to JSON files, and it’ll be easy to import all of them in the Docker image built by @adamg

lluismf · February 22, 2017, 8:44pm

Any database used for business intelligence is bigger than the online one, because of the de-normalization. So yes, it’ll be even more huge unless instead of replicating raw obs we do just aggregated (grouped) data. There are many options.

raff · February 23, 2017, 11:23am

Fair enough!

darius · February 23, 2017, 10:40pm

This is what I meant to say.

If your approach is “index all the patients/visits/encounters/obs in ElasticSearch” then you’ll be able to do some cool stuff with ES, but it will probably explode when you try it for an implementation with millions of obs.

But every single one of the metrics @raff mentioned in the first post can be calculated without indexing the entire obs table.

(Basically what I’m saying is don’t index the entire obs table for analytics/reporting/visualization; everything else is of manageable size, but not obs.)

lluismf · February 23, 2017, 10:45pm

I just found Demo Data - Resources - OpenMRS Wiki I will install Mysql and this database, it’s enough for the ETL.

I will probably have doubts about the queries needed to create the indexes.

adamg · February 24, 2017, 10:41am

@raff I’ve created a repo with the current config for RA an ELK. This is my first time working with ELK so please let me know if there is anything that I should change

raff · February 24, 2017, 11:59am

Could you please use https://github.com/openmrs/openmrs-contrib-ansible-docker-compose/blob/master/files/demo/dbdump/demo-refapp-2.5.sql instead? It’s way newer than anything I see on that page. It has 500 patients.

raff · February 24, 2017, 12:01pm

@adamg, I left you a comment about putting ELK containers in a separate docker-compose so it can be optionally started with:

docker-compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.elk.yml up

lluismf · February 24, 2017, 12:10pm

The one I loaded has 5000 patients and 500.000 observations. Isn’t better with high volumes? To see how fast logstash loads in EL.

lluismf · February 24, 2017, 12:10pm

Good job! Just out of curiosity, did you have to install the ELK stack before, or Docker knows how to do it?

raff · February 24, 2017, 12:11pm

@lluismf, was it demo-2.0.1.sql.zip?

I guess we could upgrade it to RA 2.5… It will be faster than creating 5000 patients with the demo data module.

adamg · February 24, 2017, 12:14pm

@lluismf I did not install anything manually