Registration Core Similar Patient algorithm needs to be tweaked

mogoodrich · September 21, 2015, 5:31pm

Hello all–

So over the weekend we switched one of our main hospital implementations (with 150,000+) patients from our legacy patient registration functionality to the registration app functionality that is part of the reference application. Unfortunately, in short order we had to back out to our old patient registration because the “similar patients” check that is part of registration core was getting way too many hits and bringing the system to a grinding halt.

I’m in the process of loading up a sample copy of our large database to test in greater detail, but, looking at the code, it looks like the registration app hits the “similar patient” endpoint via AJAX after every field is entered, and the registration core endpoint does a query as long as there is at least one field to search against.

We need to limit this somehow. My first thought would be to restrict the search so that it is only kicked off if at least two elements in the person name are populated. I’d also add a max results parameter to the hibernate criteria. Thoughts?

Take care, Mark

raff · September 21, 2015, 6:02pm

Your tweaks make sense to me. Let us know if it is enough to address the issue or we need to tweak it further.

darius · September 21, 2015, 6:41pm

Make sure you make this change at the algorithm level, and don’t bake it into the UI.

Last time @mseaton and I discussed this I think we concluded that the client would still ask “any similar patients?” each time you exit a field, but the default similar patient search algorithms on the server side should no-op and return quickly if there wasn’t enough data.

wyclif · September 21, 2015, 6:48pm

I agree with Darius that you might want to make the change in the similar patient matching algorithm implementation

mogoodrich · September 21, 2015, 6:52pm

Yep, agreed… I will implement in the algorithm.

mogoodrich · September 21, 2015, 7:55pm

Ah, so looks like our main problem was not with the basic algorithm, but the NamePhoneticsPatientSearchAlgortihm. Looks like the NP search worked by querying all name matches using the NP service and intersecting them with the result of a basic similar patient algo with the name component excluded. So if you pass in only a name, this second query was basically returning all non-voided patients in the DB. No wonder it crashed the system!

We will still probably want to add the restrictions to the basic search regardless, however. I’m going to work on that and then move on to improving the NP search.

mogoodrich · September 21, 2015, 10:03pm

Hmm… so it’s not 100% clear what to do in this case. Restricting the search so that it only executes if 2 or more elements of the patient name are populated wouldn’t do a ton of good since the DB query is an OR query, not an AND, so it wouldn’t reduce the number of matches returned.

Basically, the basic algorithm does an OR search on all available fields, and then loops over the return patients and scores them based on which fields match, etc. This seems a reasonable way to do things as long as you’ve got a small-to-medium sized patient set. Since (I believe?) we are switching to Lucerne in 1.11, it seemed like it wasn’t worth putting too much effort into reworking this. So my plan is to focus just on the Name Phonetics algorithm and improve it to function similar to the algorithm we have in our legacy patient registration module.

burke · September 23, 2015, 3:10am

We should have two types of seaches. I think we may have referred to them as “sensitive” and “specific”, but the point is that the algorithms/scoring & attributes compared may differ between a sensitive search (used when you are trying to help the user find a patient based on partial information) and a specific search (used to help ensure that the new patient you’re about to create isn’t a duplicate).

In any case, Lucene will make finding potential matches & having scores for them much simpler and return results nearly instantly even for large patient sets. So, use the 80/20 rule when trying to make the pre-lucene search functional.