Data Quality Dashboard

bharatak · July 14, 2016, 3:39pm

Hi, At one of our implementation sites, there is a requirement of providing a one-shot view of data inconsistencies in the system and an easy way to fix them. We are planning to leverage this opportunity to build a data quality dashboard. The following are the design considerations

The rules should be extensible & configurable. Implementers of Bahmni should be able to write groovy, sql rules.
The evaluated inconsistencies should be grouped by patient if possible (instead of inconsistencies by rules). Its a good user experience feature to show all the inconsistencies for a patient so that the data managers can fix issues of one patients at a time. The following is the screenshot.

This poses a question of when the rules should be run. Obviously running all the data quality rules in a single API call will not scale. This might demand for a scheduled job to run at a regular interval. At the same time, the users would want to see that the data fixes should result in the record being removed from the inconsistency list.

We have explored data integrity module. The rules are driven mainly through sql here. We are looking for more flexibility in terms of groovy/java if possible. Also, there is no provision of storing the actual patient list that is inconsistent. It would just highlight if the rule is successful or not (Please correct me if I am wrong here).

We would like to take inputs of broader openmrs community in this regard. Our initial thought is as follows.

Have an openmrs scheduler task to run all the groovy files. Store the results of the data like Patient_id, rule_name, notes and addnl_info in some table. This is a temporary table that is cleared before starting every run.
The API call on DQ dashboard will just return the response from this intermediate table.
Once the user fixes the issue (like fixing the form data or updating the drug orders), then (s)he will be able to run all the rules through a REST call for this particular patient. So, that he can see the data being reflected on the UI.

@jteich @darius @burke @mseaton Inviting some people to the conversation.

tomgriffin · July 14, 2016, 8:39pm

Have you thought about enhancing the existing Data Integrity Module with these features?

The current Data Integrity Module is limited to SQL-based checks (samples available here and here). There are checks where SQL makes most sense, such as finding encounters after the patient’s death. SQL becomes a problem as the data becomes more complex (and you end up having to write tons of rules for every possible scenario).

Groovy support would help identify trouble data without having to define as much in terms of expected values. Checks could be built which first profile the data in order to determine reasonable thresholds before identifying outliers.

From a UI standpoint, I completely agree with you: we need to be able to toggle the results between matching patients and rules. The rules view would look similar to the current status screen. We could enhance the patient view to be able to better analyze the data. This may result in better integrity checks being written.

I have a few more thoughts but first wanted to make sure I was understanding everything correctly before potentially sending us even further down the wrong path.

bharatak · July 15, 2016, 12:53am

Thanks @tomgriffin. We see 2 problems in using data integrity module

The current rule based view will just give a Passed/Failed status and wouldn’t store the inconsistent data. From end user’s perspective, they wouldn’t know which data to fix unless we run the rule again.
Collating the output of all the rules and pivoting it by patient is unavailable. This is not possible unless we persist the data somewhere.

By the way, the UI you have pasted looks interesting. Did you already built it in Integrity module? Or are you envisioning how the UI should look like?. Thank you for the response.

darius · July 15, 2016, 1:50am

@bharatak, from peeking very briefly at the Data Integrity module code, I think it does persist the results for asynchronous use (in dataintegrity_result, i.e. IntegrityCheckResult). It looks like they do it with a “UID” instead of just patient_id, though in the use case you describe we probably need a way of doing it at the level of the patient record. (Although for endTB it would more correctly be stored against an episode of care, right?)

Anyway, it may be worth a closer look at whether enhancing the existing module is worthwhile.

Broadly:

+1 to running things on a schedule, with a way to trigger a manual run, and being able to analyze/pivot on the persisted data
I’m not sure it’s really required to rerun the rules for one particular patient; I think the users could probably live without this. (It’s definitely nice to have, but it has implications on how the rules must be written, if you’re to be able to run them against the whole database or a single patient.)
It would be nice if the logic behind a check doesn’t only work in this module, but can also be run/displayed on the patient dashboard in the form of alerts (though in this case not running it real-time might be a problem)
Also, yes, have a patient-centric view seems like a requirement for the workflow I’d expect people to follow when fixing data.

tomgriffin · July 15, 2016, 2:27am

@bharatak: I mocked that UI up when I responded to the post

I proposed enhancing the existing module because I think all this functionality belongs in the same place.

If after digging into the existing code we find that extending it is more trouble than it’s worth, can we explore adding SQL-based checks to the new module and then retiring the old one? Thereby leaving us with one data integrity module capable of both sql and groovy-based checks?

This will allow other enhancements (patient-view UI, better, scheduling and persistence of results, etc…) to be universally available for all checks.

bharatak · July 15, 2016, 3:40pm

Thanks @darius and @tomgriffin. We are doing a design from our side. Will update you on how things are going. We have a strict deliverable and we might have to proceed soon on this for MVP. But will be flexible in terms of feedback in subsequent releases. Will keep you posted.

ssmusoke · July 26, 2016, 9:52pm

@bharatak do u have any updates to share even if from a high level design

bharatak · July 27, 2016, 1:33pm

hi @ssmusoke I’ll respond on this in a couple of days. We have come up with an MVP of the new data integrity module. Will publish it

ssmusoke · July 27, 2016, 1:42pm

@bharatak Looking forward to seeing the MVP

bharatak · July 29, 2016, 10:18am

Hi, The following is a WIP design document on DataQuality Dashboard. Request you to please provide your inputs. Thank you.

ssmusoke · August 10, 2016, 12:23pm

@bharatak I tried to install the bhamni-data-integrity module within Ref App 2.4, but I need to bring in a whole bunch of modules from Bhamni due to a dependency on bhamnicore - In the code I do not see where the dependency is used.

Is there a demo area where I can check out the workings of the module.

bharatak · August 10, 2016, 4:36pm

Hi @ssmusoke, Ideally there is no dependency on bahmnicore. We are planning to make it a separate repo and remove the bahmnicore dependency. Will share the new repo details in a couple of days.

Meanwhile, I will ping you the environment details where you can see it in action personally. These are a couple of rules that are written - here and here.

ssmusoke · August 10, 2016, 5:54pm

@bharatak Awesome, I removed the bahmnicore dependency in config.xml and was able to install it my environment. Looking forward to the new repo and any documentation on writing sample rules for example:

Missing observations based on another observation for example patient started on a treatment without a test or diagnosis recorded, or a diagnosis after treatment is started
Missing encounters
Lab results without tests being ordered etc.

ssmusoke · August 15, 2016, 4:41am

@bharatak quick question is it possible to move the current code to the data integrity module and bump up the major version to the next one. That way there is one module

bharatak · August 15, 2016, 5:32am

We would love to do that. I think it will be the decision of module owners of openmrs-module-dataintegrity. It is a completely re-written module and is just an MVP. We are open for collaboration from the community - both from design and development perspective. Its currently available here. The UI layer is part of bahmniapps. We we would love to move the UI also as a separate app. But if somebody wants to do it as a OWA (or a similar solution), they are more than welcome.

dkayiwa · August 15, 2016, 7:39am

For now, i would say that practically, @ssmusoke is the active developer and hence owner of the data integrity module.

@ssmusoke, since the bahmni module is completely rewritten, have you evaluated doing the opposite? That is harvesting the features from the dataintegrity module into the bahmin one? You may find this easier than the reverse.

ssmusoke · August 15, 2016, 8:24am

@bharatak as @dkayiwa has mentioned I am currently the active module owner for data-integrity.

I would like to suggest the following next steps:

Keep the current openmrs-module-dataintegrity code as the 3.x branch
Move the MVP code to the HEAD and bump it up to 4.0 taking the code contribution as a a major re-write

Do you have any recommendations/suggestions for the getting the app to work for non-Bahmni installs?

bharatak · August 15, 2016, 9:43am

Hi @ssmusoke, this sounds like a plan. We will send a PR as you mentioned. There is no dependency on Bahmni for this OMOD to work.

As mentioned in previous message, the front end is dependent on the bahmni codebase. I want to separate it out of it using openmrs OWA feature. We can have it as part of the same codebase so that the UI and Backend are at one place. This might need some work. What is your opinion?

ssmusoke · August 15, 2016, 9:46am

@bharatak Awesome, the dependency is in the config.xml which requires the module org.bahmni.module.bahmnicore https://github.com/Bahmni/openmrs-module-bahmnidataintegrity/blob/cf723374905097785420c9fd2bb66ccf1365e104/bahmni-dataintegrity-omod/src/main/resources/config.xml#L19

bharatak · August 15, 2016, 9:54am

Yeah… that’s not required… We will be removing it.