Evaluating Kibana for reporting and visualization

Let’s try out Kibana for reporting and visualization of data in OpenMRS Reference Application. Some metrics we could start from:

  1. Number of patients admitted by day on a line graph
  2. Diagnoses frequency pie chart
  3. Number of patients by origin on a map

The steps we need to take:

  1. Setup RA 2.6 (using SDK) with demo data for at least 1k patients with observations using the referencedemodata module as explained here https://wiki.openmrs.org/display/docs/Reference+Application+2.3#ReferenceApplication2.3-Demopatients We’ll need to make sure demo data is generated with the info we’d like to report on so changes in the referencedemodata module may be required.
  2. Create docker config (openmrs-sdk:build-distro)
  3. Add Kibana docker service to docker-compose
  4. Connect Kibana to MySQL docker service
  5. Write SQL queries to extract data from MySQL to Kibana
  6. Setup basic charts

@lluismf, are you okay with that approach? Would you be willing to setup all the parts and create charts?

2 Likes

A couple of things:

  1. We have to install ElasticSearch and Logstash too (or the full ELK stack)
  2. Why do we need Docker ? If the goal is to pack a demo in a docker image, I’d leave it for phase 2.

For the moment I’ll start installing the ELK stack

Do we really need Logstash too? I would think that E+K is enough.

You don’t “need” docker, but it’s the easiest way to make this easy for anyone interested in playing around to replicate.

I think this is a really neat idea. But I think we should be a bit more directed in how we approach this.

First, let’s make one more attempt to reach out to implementations on the #implementing list and see if someone wants to provide some real deidentified data, in exchange for getting some analyses coded up. (Randomly-generated data is not going to product pretty charts.)

Also, I think that @jdick and AMPATH may already be putting OpenMRS data into ElasticSearch for analysis, and maybe we could use something they’ve done as a starting point. Also, @mogoodrich and @mseaton were recently experimenting with something along these lines, but I think they went another way.

ElasticSearch + Kibana are a great combination, but the underlying model is to search through indexed documents. It’s quite flexible, and you can aggregate these results and replicate some of what you’d otherwise do with OLAP cubes, you don’t get the same performance and memory usage characteristics. (Specifically, if you were to index all the observations into ES it’s going get huge, and require more resources than OpenMRS itself. That seems impractical for a typical implementation, so from the beginning I would avoid this.)

If I were doing this I would look into this workflow:

  • triggered by an atom feed, schedule, or some other event mechanism… [implement this part last]
  • fetch key events in a simplified view (not the pure OpenMRS data model) [ideally produce these via Reporting and Reporting REST, but you could hack it in the first pass]
  • thus we generate a quick spike at an analytic data model, indexed in ElasticSearch
  • build Kibana graphs on top of this
  • then see if you can embed the Kibana graphs in the RA UI via an OWA (or alternately, copy the ES queries that Kibana is doing into an OWA + use some graphing library)

PS- Here’s a completely different direction, which is interesting to a different kind of end-user: https://medium.com/bahmni-blog/introduction-to-exploratory-data-analysis-of-bahmni-using-r-6c186fd6f010#.j9taxud81

1 Like

well you need something to receive the logs and process them using GROK. You can try graylog since it lets you send directly to it and it still uses elasticsearch for the indexes. I suggest you use the elastic *beats to collect the logs,metrics or any information.

According to a colleague, it’s the easiest way to import rows from a relational database into Elastic. Maybe it’s killing flies with guns, I haven’t tried yet. The alternative is to build an ETL process to do the same, but with Logstash you don’t need to code.

I found this article https://www.quora.com/Whats-the-best-way-to-setup-MySQL-to-Elasticsearch-replication About half the answers recommend using LogStash. for the ETL process.

I see, but still we can leave it for later. I have a Windows 7 machine, you don’t want a docker image of that right?

I can’t recall us being ever able to get deidentified data from an implementation (yes, let’s make one more attempt!), but we may get them to try out what we accomplish here and have an incentive to collaborate further on that setup. It is why I think it is important to do it in Docker so it can be easily run by others on real data.

There are official docker images for the whole stack at https://github.com/elastic?utf8=✓&q=-docker&type=&language= with documentation so it shouldn’t be hard.

I’m very excited to hear your stories!

Has ever anyone evaluated how much it is in practice? I figure you can always limit to a year of observations and still find it useful.

@lluismf, how about @adamg will work on creating the docker setup and you take it from there… @adamg, would you be interested in creating docker-compose with RA 2.6 and ELK (full stack)?

@raff I can try :smiley: Do you want to do it based on docker-compose generated by SDK ?

@adamg, yes, it would be the most straightforward. You can even write yet another docker-compose file for Kibana to be combined with RA instead of modifying the generated one. Thanks!

Tell me stubborn but I prefer to work natively instead of a virtual machine. I will create a logstash .CONF file with the queries and Kibana allows to export its artifacts to JSON files, and it’ll be easy to import all of them in the Docker image built by @adamg

2 Likes

Any database used for business intelligence is bigger than the online one, because of the de-normalization. So yes, it’ll be even more huge unless instead of replicating raw obs we do just aggregated (grouped) data. There are many options.

1 Like

Fair enough!

This is what I meant to say.

If your approach is “index all the patients/visits/encounters/obs in ElasticSearch” then you’ll be able to do some cool stuff with ES, but it will probably explode when you try it for an implementation with millions of obs.

But every single one of the metrics @raff mentioned in the first post can be calculated without indexing the entire obs table.

(Basically what I’m saying is don’t index the entire obs table for analytics/reporting/visualization; everything else is of manageable size, but not obs.)

1 Like

I just found https://wiki.openmrs.org/display/RES/Demo+Data I will install Mysql and this database, it’s enough for the ETL.

I will probably have doubts about the queries needed to create the indexes.

@raff I’ve created a repo with the current config for RA an ELK. This is my first time working with ELK so please let me know if there is anything that I should change :slight_smile:

1 Like

Could you please use https://github.com/openmrs/openmrs-contrib-ansible-docker-compose/blob/master/files/demo/dbdump/demo-refapp-2.5.sql instead? It’s way newer than anything I see on that page. It has 500 patients.

@adamg, I left you a comment about putting ELK containers in a separate docker-compose so it can be optionally started with:

docker-compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.elk.yml up

The one I loaded has 5000 patients and 500.000 observations. Isn’t better with high volumes? To see how fast logstash loads in EL.

Good job! Just out of curiosity, did you have to install the ELK stack before, or Docker knows how to do it?

@lluismf, was it demo-2.0.1.sql.zip?

I guess we could upgrade it to RA 2.5… It will be faster than creating 5000 patients with the demo data module.

@lluismf I did not install anything manually