Unable to restrict Patient Searches by identifier type after upgrade to Lucene

On the whole, the Lucene patient search is awesome and greatly increased the performance of our patient searches.

However, one side effect is that the getPatients(name, identifier, identifierTypes, matchIdentifierExactly) API method is broken: the “identifierTypes” and “matchIdentifiersExactly” parameters are ignored.

    /**
	 * @see PatientService#getPatients(String, String, List, boolean, Integer, Integer)
	 */
	@Override
	@Transactional(readOnly = true)
	public List<Patient> getPatients(String name, String identifier, List<PatientIdentifierType> identifierTypes,
	        boolean matchIdentifierExactly, Integer start, Integer length) throws APIException {
		
		return dao.getPatients(name != null ? name : identifier, start, length);
	}

Ignoring the identifierTypes parameter seems particularly problematic. Our Pacs Integration module imports radiology reports from an outside system and uses this API method to match on identifier type. We recently ran into an error when an import failed because matching on primary identifier returned two patients, because another patient had a different identifier with the same value. In this case it just resulted in an error, but in other case it could have led to the report being linked to the wrong patient!

I’ve ticketed what I’ve found here. It feels like the first ticket at least should be a community priority:

Great that you have discovered these and immediately created tickets for them. :slight_smile:

Do you want to also make them ready for work?

Thanks for the reminder @dkayiwa!

I think the reason for this is, that inside the LuceneQuery.java file (LuceneQuery.java#L243) there exits a if clause that checks if the object is of type PersonIdentifier. If it is, which is the case for the getPatients calls, it will automatically set the analyser to EXACT_Match. In my opinion this is the reason why this parameter is ignored.

protected MultiFieldQueryParser newMultipleFieldQueryParser(Collection<String> fields) {
		Analyzer analyzer;
		if (getType().isAssignableFrom(PatientIdentifier.class) || getType().isAssignableFrom(PersonName.class) || getType().isAssignableFrom(PersonAttribute.class)) {
			analyzer = getFullTextSession().getSearchFactory().getAnalyzer(LuceneAnalyzers.EXACT_ANALYZER);
		} else {
			analyzer = getFullTextSession().getSearchFactory().getAnalyzer(getType());
		}

In my opinion that is not aligned to PatientIdentifier.java class where we try to set the matched of Lucene based on the field name:

Fields({
			@Field(name = "identifierPhrase", analyzer = @Analyzer(definition = LuceneAnalyzers.PHRASE_ANALYZER), boost = @Boost(8f)),
			@Field(name = "identifierExact", analyzer = @Analyzer(definition = LuceneAnalyzers.EXACT_ANALYZER), boost = @Boost(4f)),
			@Field(name = "identifierStart", analyzer = @Analyzer(definition = LuceneAnalyzers.START_ANALYZER), boost = @Boost(2f)),
			@Field(name = "identifierAnywhere", analyzer = @Analyzer(definition = LuceneAnalyzers.ANYWHERE_ANALYZER))
	})
	private String identifier;

It would be great if someone could verify my hypothesis. Perhaps @raff is able to confirm my idea?

Idea for fix: I am not so well with Lucene. Is there a ways to tell Lucence to use the field that is part of query ? basically like: fields.contains(identifierAnywhere) => use analyzer `

I think think makes sense @fruether… unfortunately I’m a newbie at Lucene as well, so can’t really weigh in… does anyone else out there have Lucene experience?

@raff do you have any thoughts on this? :slight_smile:

@fruether, the analyzer is applied when indexing, but also when querying. The query time analyzer can be different from the indexing one in rare cases as in here. We do it so, because we use different analyzers for different fields at the indexing time and we do not want to apply the default analyzer to the whole query for all fields.

Basically, leave the analyzers as is and if matchIdentifiersExactly is true you need to query the indentifierExact field, whereas if it is false you need to also query identifierPhrase, indetifierStart and identifierAnywhere. It’s analogous to how person name matching works based on a global property, see https://github.com/openmrs/openmrs-core/blob/master/api/src/main/java/org/openmrs/api/db/hibernate/PersonLuceneQuery.java#L72-L79

Alright. Did not know that they are applied twice. Thanks for the clarification @raff

The following test case would be enough to verify that this issue is fixt, right (@mogoodrich) ?

	patients = patientService.getPatients("123", null, types, true);
	assertEquals(0, patients.size());
	patients = patientService.getPatients("123", null, types, false);
	assertEquals(1, patients.size());

where only a Patient with the following ID exists:

<patient_identifier patient_identifier_id="1" patient_id="2" identifier="1234" identifier_type="1" preferred="1" location_id="1" creator="1" date_created="2005-01-01 00:00:00.0" voided="false" uuid="e997ac86-4d8a-40f3-bedb-da84d35917b8"/>

Ticket: https://issues.openmrs.org/browse/TRUNK-5396

Changes are part of the Pull request: https://github.com/openmrs/openmrs-core/pull/2688/

@fruether yes, that looks correct as a test case!

I was also discussing this in the Jira ticket but I think maybe it is more convenient to talk about it here:

My concern is regarding https://issues.openmrs.org/browse/TRUNK-53963 and the behaviour that is defined by the matchExactly parameter. Or to be more concrete: If the matchExactly parameter is set to false how should the matching behave. Should it match on start, anywhere or is there a third option?

To make the matter more concrete I created the following test case lines. Should the call return 0 or 1?

types.add(new PatientIdentifierType(1));
patients = patientService.getPatients("123", null, types, false);
assertEquals(1, patients.size());{{}}

Given that we are having the following Patient:

<patient_identifier patient_identifier_id="1" patient_id="2" identifier="1234" identifier_type="1" preferred="1" location_id="1" creator="1" date_created="2005-01-01 00:00:00.0" voided="false" uuid="e997ac86-4d8a-40f3-bedb-da84d35917b8"/>

I would say “anywhere”, but I don’t have a strong preference between that and “on start”.

I would think that both “on start” and “anywhere” would return 1 in your example, though, since “1234” starts with “123”… or am I misinterpreting somthing?

@mogoodrich

I would think that both “on start” and “anywhere” would return 1 in your example, though, since “1234” starts with “123”… or am I misinterpreting somthing?

The issue is that in my solution right now it returns 0. This is probably because neither onStart nor Anywhere is set as global property. Since 0 seems to be the worst option we have to go either for onstart or anywhere.

Since you prefer “anywhere” I will go for it if there is no other strong opinion on it.

More Details:

Code in question:

else if (OpenmrsConstants.GLOBAL_PROPERTY_PATIENT_SEARCH_MATCH_START.equals(matchMode)) {              fields.add("identifierStart");} 

else if (!matchExactly || OpenmrsConstants.GLOBAL_PROPERTY_PATIENT_SEARCH_MATCH_ANYWHERE.equals(matchMode)) {

Raf correctly mentioned that matchExactly can never be true at this point. This is true. But since no Global Property is set neither identifierStart not identifierAnywhere is assigned. As a result, 0 will be returned for the described test case. In my opinion 0 matches is the worst value to return especially since then a change of matchExactly would have almost no effect.

Therefore, I would like to have a defaul value to be set in case of no valid global property and matchExactly = false

Current code in new PR: Code

Thanks @fruether… I got a bit behind today, but will look to review this tomorrow…

Take care, Mark

I think I understand now, and agree with your solution, but couldn’t the code just be:

if (matchExactly) {
		fields.add("identifierExact");
	}
	else if (OpenmrsConstants.GLOBAL_PROPERTY_PATIENT_SEARCH_MATCH_START.equals(matchMode)) {
		fields.add("identifierStart");
	} 
	else {
		fields.add("identifierAnywhere");
	}

I think your proposed code should work as well. To be sure I will adapt it accordingly and see if any test case starts failing.

I can do this till Monday

Great, thanks @fruether!

@mogoodrich all the tests still passed.

@raff I think the issue with my last pull request regarding this should be solved now. The current new version of the PR with the adapted changed can be found her: