GSoC 2017: Patient Matching 2.0

Hello All,

I’m Acha Bill, final year software engineering from the university of Buea, Cameroon. I’ve been contributing to OpenMRS for some time.

I am interested in the Patient Matching 2.0 project for GSoC 2017. I’m installing the module and going through the code on GitHub.

Hi @sgrannis, @burke , I’m Yasara, final year student from University of Moratuwa, CSE department. I’ve been getting familiar with OpenMRS for some time now, and am also interested in working on the Patient Matching 2.0 Project for GSoC 2017. I have configured current patient matching module. Please guide me.

Hi @ayd, I think, there are some restrictions for GSoC Students. Students who are going to participate in GSoC should achieve the dev/1 badge. When you complete your profile, then you can get dev/null. After that, you need to fix some bugs and take the Open MRS quiz to achieve the dev/1. Please follow this Getting Started as a Developer Guide Please do it in your first point to be selected as a successful candidate. :slight_smile:


Hi @sgrannis, @burke

I am Lahiru Jayathilake, from University of Moratuwa, Computer Science and Engineering Department. I’ve been going through the projects listed for GSoC 2017 and I would like to work for Patient Matching 2.0 project. I am quite familiar with Java, SQL, HTML/CSS which are required for this project and to mention I have a little knowledge about apache Hadoop and I believe it would be a great chance me to expand my skill as well as to contribute to this project at my best.

Thank you.


hi @lahiruj @ayd :slight_smile: This thread is only for asking relevant questions regarding the project itself and not for introduction. Please introduce yourself here. Go through the following links to know more about your project and to get started with OpenMRS :smile: Patient Matching Project getting started developers manual Hope you have a great time at OpenMRS :blush:

Hi Everyone,

I found something confused me, In the primary objective it has mentioned something,

As part of this project, you will refactor the Patient Matching module to implement a more efficient approach to only processing patient records that have been recently added, or their identifying information has changed since the last time patient matching was performed.

Primary goal is to make this process efficient by reducing the size of the dataset, my confusion is with the part that I highlighted. Suppose if information of a particular patient has changed couple years back(let say 5,6 years) is that mean that record also going to be processed irrespective of the time that the information has been changed?

Thank you :slight_smile:

Thank you @suthagar23 I’m currently working on issues as well. :slight_smile:

In the first run of patient matching, all patient records would be processed. When you run patient matching again two days later, you would only run against patients whose demographics had changed (new patients and edited patients) over the past 48 hours. Currently, the module will compare every patient to every patient again because it doesn’t consider that it just checked those records two days earlier.

1 Like

Thanks for the clarification :slight_smile:

@burke Is there any particular indexing mechanism in current implementation :slight_smile: ?

The current implementation uses a linkage engine that uses text files and the command line’s sort command to perform the sorting. I don’t think there’s (yet) a mechanism to persist previous matches. Part of the challenge of this GSoC project is to come up with an effective/efficient way of “remembering” which matches have already been performed so only new/changed data needs to be considered on subsequent runs of the same patient matching algorithm.

The problem

Let’s say we have 100,000 patients in our system and we want to match using names, birthdate, and gender. If we compare all patients to all others, that’s 5 billion – C(100000,2) – comparisons. If we run the same match 2 days later after 100 patients have been added and 10 updated, we’d rather run 11 million – C(110,2) + 110×100,000 – comparisons instead of 5 billion again. :slight_smile:

Coming up with a way to solve this problem is one of the main goals of this GSoC project. Are existing text files sufficient to efficiently avoid repeating comparisons? Do we persist a record of matches performed in the OpenMRS database? Or do we need to introduce another mechanism for tracking prior comparisons? Or maybe we can simply infer which comparisons haven’t yet been done by persisting timestamps? Ideally, the module would not only be able to know which comparisons were previously preformed, but also keep track of which comparison were previously marked by the user as true matches or true non-matches.

5 posts were merged into an existing topic: GSoC 2017 - Patient Matching 2.0

@lahiruj, please create a new topic in this GSoC category with the title “GSoC 2017 - Patient Matching 2.0” with a few sentences describing the project, providing links to resources (e.g., wiki page, your blog) and letting folks know we’ll be using that topic for project planning & discussion. Once you’ve create it, I can migrate some of our conversation from this thread into that one and we can continue public project discussion there.

I have created the thread.