GSoC 2017 - Patient Matching 2.0

Hi everyone,

This thread aims at project planning and related discussions in [Patient Matching 2.0 Project] ( Primary Mentor - @burke Backup Mentor - @sgrannis

The objectives of this project are,

1.  Perform patient match taking considerations of the previous run (incremental patient matching).

  • This method will avoid unnecessary record comparisons.

  • A match will be performed upon,       addition of new patients or       changes in patients’ records

  • After an incremental match a separate output will be created.

  • This incremental matching is specific to a strategy.

  • An output from previous run can be selected as the memory for the next run(eg. For strategy “A” there are 50 runs in last 3 months, so it should be possible to pick one output from the 50 as the memory for the next match)

2.  Avoid repeated manual reviews of previous manually reviewed matches

3.  UI changes to reflect above functionalities

About Patient Matching Module: The project source code is at: openmrs-module-patientmatching My blog URL: My repo -

1 Like

Hi @burke with the help of the brainstorm session with you, Dr. Shaun, and Andrew I have finalized this project plan and some important points of the discussion are included in here
Coding Period   May 30, 2017 - August 21, 2017         Working Period         May 30, 2017 - June 26, 2017         Phase 1 Evaluation   June 26, 2017 - June 30, 2017         Working Period         July 1, 2017 - July 24, 2017         Phase 2 Evaluation   July 25, 2017 - July 28, 2017
Problem with the current version is when a patient match is done the module does not consider the output from prior run to compute the new output. Main goal is to refactor this module to perform incremental match instead of isolated matching processes.

1.  A method to perform patient match taking considerations of the previous run (incremental patient matching) [May 30, 2017 - June 26, 2017]

  • A match will be performed upon,       addition of new patients or       changes in patients’ records

  • The basic idea of doing this is to find what are the records that match with the above two criteria and perform a match with those records with every other record       A query to get what had changed/added in the list of patients’ records.( A method Dr. Shaun suggested was to keep a running list of as to what are the potential pairs that don’t match )

  • After an incremental match a separate output will be created.

  • This incremental matching should be specific to a strategy.

  • It should be possible to select between a particular strategy we are going to select as the memory for next patient match (for eg. For strategy “A” there are 50 runs in last 3 months, so it should be possible to pick one output from the 50 as the memory for the next match)

2.  Module should be able to perform patient match with each and every record with it self despite of the output from prior runs (In simple terms not an incremental patient matching) [July 1, 2017 - July 12, 2017]

  • The reason for this because there might be issues with the previous run

3.  UI changes to reflect these functionalities [July 13, 2017 - July 20, 2017]

4.  Testing & Debugging [July 20, 2017 - July 24, 2017]

5.  Implementing Automated Unit Tests & Integration Tests [July 20, 2017 - July 24, 2017]

6.  Code Review + Documentation [July 29, 2017 - August 20, 2017]

@burke any sugestions? :slight_smile:

1 Like

There are a some milestones you could call out here:

  • Come up with query to detect new/changed patients. I think this will have to be generic. In a really fancy version, we would detect only changes in attributes used by the strategy, but I think that level of fanciness is out of scope for now.

  • Refactor module so, by default, it just runs matching on patients who are new/changed since strategy last run.

  • Validating our approach (using small set of demo patients)

This is be a relatively small change (i.e., just exposing an option to ignore date of last run), so hopefully won’t take too long to implement.

In test-driven development (what we should be striving for), tests should be written before code.

Hopefully we’ll be getting in some code review along the way. :slight_smile:

Thank you for the feedback @burke:slight_smile:. I will update my project plan accordingly.

Hi @burke,

I have a confusion regarding the patient matching process. If I do a match with a certain strategy “A”, I will get possible duplicates to be merged as in the following image.

If I rerun the match with the same strategy In the current version of Patient Matching module I will get the same result set unless I have not performed any merge operations.

The confusion is, in the incremental matching process, given that we ignore the merge operations and carry a match with the same strategy, should the module display those possible duplicates or should it ignore those patient records as they were matched earlier, back then in couple of days? (assuming that no patient is added or updated)

According to my perspective those records should not be considered in the incremental patient matching process. Am I right?

Thanks :slight_smile:

I would expect to see any new additions alongside previous results – i.e., each iteration updates a single set of results rather than creating a separate result set. In other words, if a patient matching strategy is run daily for a month, there would be a single set of results for that strategy that may change over the month as patients are added/updated/merged, but the user would not have 30 distinct result sets to navigate.


I trust your exams went well. Nice blog post! :slight_smile:

Some “open source” logistics. I think you may already be doing these things, but just want review:

  • Tickets should be in the PTM project on JIRA. I have made you an admin for this project.
    • Try to define work in small, logical chunks (when possible, try to break features into 1- to 3-day chunks of work). These can be grouped into epics when needed.
    • Epics, when needed to group a set of issues into a larger set of work, should have a clear target so it’s clear when then epic has completed. “Initial support for incremental matching” could be an epic (as long as the acceptance criteria for initial incremental matching support are defined); however “Make patient matching better” or “Patient Matching Strategies” would not be a good epics, since it’s not clear when or if they would ever be complete.
  • You should be working on a fork of the patient matching repo. Please include a link to your fork in your initial post. For remote URLs, your fork would be origin and the “official” patient matching repo would be upstream. This will allow you to submit pull requests to upstream.
    • Per our GitHub conventions, the general approach to work should be:
      1. Create issue in JIRA
      2. Create local branch with issue number as its name (e.g., git checkout -b PTM-1234)
      3. Limit changes in that branch to addressing the specific issue. If you end up doing many commits along the way, consider squashing before submitting pull request.
      4. Submit PR to upstream and add link in issue comment.
      5. Review & merge pull request + update issue.

From blog post:

4. If there are couple of old reports related to the strategy, (? what)should be prompt to user asking which report is considered as the memory for the next run.

I’m hoping we can support incremental updates as seamlessly/intuitively as possible. If we’re going to support more than one output for a strategy, then we would be forced to make the user choose between updating one of those reports vs. creating a new one.

It might be better to separate the output from running a strategy from the current “report” – i.e., making a “report” an export of the current state of a strategy’s output while maintaining only one output per strategy. In this case, the one “output” of a strategy would be interactive (let the user confirm or reject matches) and a “report” would become a read-only artifact.

Hi @burke

My exams are ongoing, until the end of this week. :slight_smile: That is why in these couple of days I couldn’t give my full contribution to this project. I hope you understand :slight_smile:

Thanks for the guidance and I have already created a ticket for one particular task PTM-82. As I mentioned in my blog post I’m currently struggling with that task, and today I asked help from the community and @dkayiwa is helping me to figure out the problem.

From blog post:

  1. If there are couple of old reports related to the strategy, (? what)should be prompt to user asking which report is considered as the memory for the next run.

This one because as, in our chat with Dr. Shaun and Andrew I thought user will be asked to select a starting point for the next run. Sure I will change it then accordingly. :slight_smile:

Thanks :slight_smile:


Hi @burke

PTM-82 ticket has been fixed. It took little longer as I had to refactor the class and few other classes as well. PR: Would you able to review this? :slight_smile:

Hi @burke

I have already completed two main parts of the Patient Matching 2.0 project. The two tasks are, PTM-82 PTM-83 PR: I am currently working on PTM-84. Which I believe to be the last biggest part of this project. The main target of this task is to perform patient match with two datasources. For the incremental patient matching it is necessary to have two datasources, one is for all the patients and the other one is for the patients that are fetched considering the date changed and date created. I have to implement some methods and change the code to perform this because the current version only supports for the deduplication of records when dealing with a database.

Thanks :slight_smile:

Hi @burke

Up to now I have completed most of the tasks that I have included in the project plan. PTM-82 - Load patients for the incremental matching PTM-83 - Generate and save reports in incremental patient matching process PTM-84 - Perform patient match with two datasources

Pull requests,


This is how it looks like in the application after these changes.

Run a report with the configuration name "test1"

A report will be created adding the incremental-report text to the configuration name.

This is how it looks like when a patient match is performed at the first time

Added a patient which shows a match with an existing record

Run the report again (Incremental patient matching)

The same incremental-report-test1 report will be updated.

New patient is added which do not exhibit any matching property with existing records

Then there will be no changes in the report.

Add another patient

Report will be updated as the following image

Update an existing patient in a way that it will show matching properties with existing records

The updated patient will be added to the same group in the report.

Every changes happen to the same patient matching report in this case it is incremental-report-test1.

1 Like

Hi @burke :slight_smile:

I have already made my 4th PR: The above PR is to allow the user to select or deselect whether the run should be incremental or not.

The web page looks like as follows,

By default the Incremental Match checkbox is checked. User can select/deselect and run the match.

Hi @burke :slight_smile:

I have completed the PTM-86 and the PR can be found here. This task is to update the report whenever the patients in the report are updated in a way that those patients no longer exhibit matching properties with the rest of the patients.

For example, Suppose there is a matched pair (say patientA and patientB). User updates the patientA in way that patientA does not exhibits matching properties with patientB. Then the matched pair in the report will be removed.

More details about this can be found in my last week blog post.

Hi @burke :slight_smile:

I have completed everything according to my project plan (and some more :wink: ). I just created the last pull request - which is for the PTM-88 issue.

To give a brief summary, I was able to complete every task mentioned in my project plan three weeks before. (Mainly incremental patient matching) Since I had more time I created two issues PTM-87 and PTM-88 and completed them as well.

PTM-87 : to merge patients in the patient matching report PTM-88 : to declare matching records in the report as non-matching patients. By doing this, these matches will not be considered for the next report generation process.

Pull Requests:

PTM-82: Functionality to load patients considering the date created and date changed PTM-83: Save and update incremental patient matching report to the database PTM-84: Functionality to support for two datasources PTM-85: Functionality to select or deselect incremental match PTM-86 : Remove matching pairs from the report when patients are updated PTM-87 : Functionality to Merge Patients in the report PTM-88 : Functionality to exclude non-matching patients PTM-89 : Ignore voided patients when running a patient match

I will give you a full update ASAP

Thanks :slight_smile: