Help optimizing search results in OCL?

paynejd · April 3, 2019, 3:01pm

@akanter @darius @ball @raff Hi all, Andela is making great progress on the new “OCL for OpenMRS” interface and it has highlighted the need to improve to how OCL returns search results and handles search criteria. Could you help us to put together some requirements of how this would ideally work? I understand that some work was already done on optimizing concept search within OpenMRS – is that true? – perhaps we can borrow from that approach. Feel free to add examples of searches NOT working as expected here too. Couple of items discussed already:

Weight concept names and IDs higher than they currently are weighted – right now, a search term showing up in a description shifts the results too much
Support multiple partial search tokens (e.g. “dia mel” should match “diabetes mellitus”

Others?

raff · April 3, 2019, 3:57pm

Since both OpenMRS and OCL both use Lucene under the hood it’s straightforward to make them behave the same way.

darius · April 3, 2019, 7:01pm

I believe that behavior in the OpenMRS 2.x Reference Application follows some pretty simple weighting that PIH wrote in the Mirebalais era. IIRC it was tested/tuned by devs, and not really informed by end-user testing, but this hasn’t been a big complaint, so maybe it’s decent? But this has not been tested/tuned against full CIEL, so in that sense it’s not good enough for OCL.

We only search based on:
- exact match on concept.mappings[].code (case-insensitive)
- fuzzy/partial match on concept.names[] (split into words, mode=anywhere)
If there’s an exact match on a mapping (e.g. user typed 1234 and you find a concept with “same as CIEL:1234”), rank this super-high (score +10000)
case-insensitive exact match on concept name, rank very high (score +1000)
name that is fuzzy/partially matched is a locale-preferred name, rank high (score +500)
shorter names displayed before longer names (score -1 * length(matched name))
code for this is at https://github.com/openmrs/openmrs-module-emrapi/blob/master/api/src/main/java/org/openmrs/module/emrapi/concept/HibernateEmrConceptDAO.java#L210-L225

Taking a step back, things that seem important to me (but I am not a real user, and I haven’t managed concepts in years…)

if they type a number, make sure that matching ids or mapping codes appear high up
if there’s an exact name match on name/synonym, this needs to be the top result
name match should be weighted much higher than description matching or full-text matching
all things being equal, show results with a shorter preferred name vs a longer one (because CIEL is full of concepts that are too specific for the typical OpenMRS use case, but in the current sorting these concepts often show up first)

Here’s an example that’s particularly annoying to get right, given what the CIEL data looks like. Say I’m trying to find 70116 - Acetaminophen

https://openconceptlab.org/search/?q=acetaminophen => concept I’m looking for is not in the first several pages, even though it’s an exact match on the fully-specified locale-preferred name
https://openconceptlab.org/search/?q=paracetamol => concept I’m looking for is the last result on the first page

https://openconceptlab.org/search/?q=type+2+diabetes => concept I’m looking for is result #17, even though it’s an exact match on a synonym. => exact match on a synonym should be above partial match on locale-preferred name

https://openconceptlab.org/search/?q=pulm+edem => zero results, but should find pulmonary edema => need to support partial/wildcard/fuzzy matches on all words.

https://openconceptlab.org/search/?q=pulmonary+ede => result I’m looking for is #6. As a heuristic, I think shorter names should appear before longer names, e.g. “pulmonary edema” should normally appear before “Postoperative Pulmonary Edema” or “pulmonary edema due to …”.

https://openconceptlab.org/search/?conceptClass=Drug&q=aspirin => lots of acetaminophen results show up because they have “aspirin free” in their synonyms => therefore, prefer matches on preferred name over matches on synonym. (this example may be impossible to get quite right because we can’t programmatically know if a synonym has a “not” meaning without adding a lot more business logic, and we usually do want to show matches)

ehebel · October 15, 2019, 3:15am

Hi everyone, My name is Esteban Hebel, I’m a radiologist and clinical informatician from South America. I contacted Darius and Jonathan some time ago and I installed OCL on a institutional server and started loading it with our local terminologies, collections and mappings. I’m fairly new to OCL and particularly to OpenMRS. I’m a SNOMED-CT guy and I started loading concepts with their Fully Specified Name in Spanish, including the semantic tag (eg “disorder”), and Descriptions, both as OCL Descriptions or as Synonyms, following using the RF2 as source. My problem is that I want to populate a autocomplete on a webapp and I can’t manage to find the proper way to get the actual list that matches the query (?q=) when I ask by Synonym or Description, since OCL seems to respond only with the First Locale Preferred term. Is there any way to query across all Descriptions to get the matching list, in order to select a ConceptID indirectly by choosing a related DescriptionID instead?

I’m really impressed with OCL and I discover new uses by the day. I use it almost daily with the decision makers to highlight the crucial importance of having terminology services across the institutions. Please keep up with this amazing job!

Thanks! Esteban

akanter · October 15, 2019, 2:47pm

Esteban, I was not aware of your work with SNOMED and OCL. I am the Terminology lead at OpenMRS and publish the CIEL concept dictionary with its maps to reference terminologies like SNOMED and LOINC. I don’t have the engineering experience to discuss the sponge searching request you made, but perhaps @rafal or @paynejd would be able to help.

ehebel · October 16, 2019, 1:17am

Hi Andrew, Thanks for the quick reply! I work for the governmental office in the poorest region in the country, with a high level of illiteracy and home of the largest indigenous ethnic group in the country, the Mapuches. I was lucky to host Paul Biondich last year for a couple of days here in my home town of Temuco. Although we don’t use OpenMRS, it is a very potent source of inspiration and Paul was astonished when he saw the OMRS Database model printed and hanging from the wall of the former office CIO, literally in the last place on earth.

We discussed several topics, one of them was the lack of traction that the national terminology server had. He mentioned that OCL could solve at least the first step and I have been ever since trying to have it up and running for a proper test run in production.

I would really appreciate to be able to share some of my concerns, mostly related to the way you name the different elements in terminologies, since I have the feeling that it differs a little from SNOMED naming conventions.

My best regards,

Esteban

akanter · January 8, 2020, 3:25pm

Esteban, did we ever set up a time to chat about this?