I’m attempting to use the patient matching module and it doesn’t seem to be reporting any duplicates in the generated reports even when I have matching records in the DB based on my matching strategy.
Could the issue be because of the settings I have in my patient matching configuration xml file? I’m using the default configuration that comes with the module and a very simple strategy that uses DOB and gender as the blocking fields(must match) while family and given name are used to finally decide actual matches (should match).
Those familiar with the module, what do you think I could be missing?
Based on the repo and comments on the module’s wiki page, it doesn’t look like the module has been used in many years… which is probably a big reason why you aren’t getting many responses.
The theory & under-the-hood approach to matching are derived from @sgrannis’s work, so is undoubtedly sound; however, I don’t know if the module is being actively used by anyone.
I went looking for when James Egg left Regenstrief (the developer who worked on the module). Here’s a conversation I found from 2016 of someone asking Shaun about the patient matching module:
That conversation was from 4 years ago. I don’t think you’re going to find any developers still familiar with the module. If you can make use of the code available, then go for it. If you need advice on the theory behind the module beyond what you can infer from the code and the discussion above, I’m sure Shaun would be happy to help.
Thanks @burke for your response, I was hoping that someone has setup the module in the past with a basic configuration and that they could know what I could be missing. I did setup what I think is a basic configuration that the module should be able to work with and determine 2 record to be duplicates but it can’t.
@wyclif, have you made any progress? If you’ve had success, how did you solve your problems? Did you happen to learn what “transposable” fields are along the way?
The issue was that I was testing with only 2 records, you need at least 3 records, preferably you need at least one matching and one mismatching record because the matching logic needs to be trained with some records to start working properly.
I noticed the following issues with the module (not the GUI application),
It doesn’t work for 2.x and above, I fixed this though.
It doesn’t go by the matching configuration from the specified config file and always performs an exact match.
The matching algorithm does not ignore voided records.
The module’s UI for creating matching strategies is very limited, all strategies built from it only perform exact matches, it doesn’t provide an option to select a different algorithm or set a different threshold and other configurations that you would specify from the config file.
I’d like to try to surface some useful information being shared in off-list conversations about the patient matching module…
When merging matched patients, the “losing” patient gets voided (merged) but on subsequent runs it gets re-matched again with the winning patient. Instead, we would expect the module to exclude voided patients when finding duplicates.
The state column in the patientmatching_matchingset table remains as PENDING even after merging patients. We’re not sure this the correct behavior.
“Transposable” fields can be swapped with one another – e.g., given name & family name can be treated essentially as one field. If you define fields as transposable with one another, we believe the algorithm will take that into account.
The patient matching module currently implements a probabilistic algorithm, but does not support deterministic methods.
A probabilistic matching algorithm is useful for identifying likely matches and possible matches between two relatively large sets of patients (at least hundreds, thousands, or more). This can be useful for discovering the best data for matches, getting the probability of a match for near matches, and to automatically adjust matching based on available data.
A deterministic matching algorithm is a pre-defined algorithm applying a set of known & reproducible rules for determining whether patients match. This is what most sites implement on their own (e.g., consider patients to match if the patients have the same identifier OR gender + names + date of birth OR …). This can work on any number of patients, but is less likely to provide a probability of match (for non-perfect matches) and does not adjust to the data provided.
I did finally verify that the patient matching actually filters out voided records, so this was a false alarm on my side.
@mksd yes as Burke mentioned, the patient matching module is designed to work for probabilistic matching, we are trying to find out from Shaun and Andrew if it’s possible to tweak it to behave in a more deterministic way based on a config setting.
And the module doesn’t mark merged records in the patientmatching_matchingset table as merged, we might want to fix it but it’s not a blocker to the matching algorithm.