Making patient search Lucene analyzers configurable

I’m working on making Lucene analyzers for patient search configurable. The idea is that an implementer should be able to control the analyzers and filters used by Lucene by changing global properties. This should also make it possible to add custom analyzers, as has been done with the NamePhonetics module.

I have a few questions.

1 – @raff very helpfully pointed the way – the Hibernate Search Programmatic API seems like the way to go for this. We would replace the AnalyzerDefs annotation on BaseOpenmrsObject with some setup code. I am not sure where exactly that setup code should go. Somewhere that runs on each startup? I’m still very unfamiliar with OpenMRS Core’s non-API code.

2 – The Hibernate Search docs show a programmatically defined analyzer being attached to an object using

cfg.getProperties().put("hibernate.search.model_mapping", mapping);

I don’t know of anywhere in the OpenMRS code that uses Hibrernate’s programmatic configuration API – everything I know of is XML. Are the two interchangeable? Do they play nice together? Do we have programmatic access to the Hibernate Configuration?

3 – Suggestions about the Global Properties API we should support? I assume we need to stay backward-compatible with patientSearch.matchMode (which allows the user to choose between the START analyzer or the ANYWHERE analyzer). I’d like to support users simply providing a list of filters and a single analyzer would be composed from them, which would be inserted in getPersonNameQuery. A user might, for example, provide the global property

patientSearch.searchFilters="ClassicFilter,LowerCaseFilter,ASCIIFoldingFilter,DoubleMetaphoneFilter"

and an analyzer like

    .analyzerDef( "name", StandardTokenizerFactory.class )
        .filter( ClassicFilterFactory.class )
        .filter( LowerCaseFilterFactory.class )
        .filter( ASCIIFoldingFilterFactory.class )
        .filter( DoubleMetaphone.class )

would be composed. The downsides of this is that it only allows you to define a single analyzer for names, and that there’s no way to provide parameters to filters.

4 – Another possibility is that we simply don’t expose a Global Properties API for this, but rather provide some way to make it configurable from a module. Maybe Analyzers could be created in Spring Components, which the thing that initializes the BaseOpenmrsObject analyzers would know about via Spring magic? Or would it work for modules to simply use the Hibernate Search Programmatic API (or some abstraction thereof) in their Activators?

@mogoodrich

1-2 You just need to follow Example 64. Use a mapping factory.

3-4 Please note that whenever you change analyzers data has to be re-indexed. I feel that sooner or later we’ll need to customize via modules so I’d go with 4. The core factory can auto-wire some factories found in modules and build up a single SearchMapping config.

Thanks! I think I’ll go with (4), then.

My first question is more basic – I understand that I need to follow Example 64, I just literally don’t know where to put that code. I don’t know what Core does at startup, or where to look for that code. @mseaton or @mogoodrich?

@bistenes not sure I know the answer either, as I rarely work with Core startup… but what is Example 64?

It’s Example 64 from https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#hsearch-mapping-programmaticapi

@bistenes, you just need to create a factory class (anywhere you want), which should be auto-discovered and instantiated by Spring at the right time at startup. You also need to modify https://github.com/openmrs/openmrs-core/blob/master/api/src/main/resources/hibernate.cfg.xml to specify hibernate.search.model_mapping property pointing to that factory.

Thanks @raff… does that mean that the only change that would need to be made to core would be to add hibernate.search.model_mapping point to the factory in hibernate.cfg.xml? I don’t suppose there is any way we could inject that property into the hibernate cfg from a module? Or add something simple to that config that allow modules to modify it?

I’m just thinking if we can do this without changing core, or with making a small enough change in core that we can backport it to 2.2.x (the version of Core we are currently using).

(If not, or if it requires too many hoops, we can look into upgrading to 2.3.x or 2.4.x, that should be possible for us… just wanted to make sure we considered any an easier path).

Thanks! Mark

As @bistenes noted we’ll need to move away from Hibernate Search annotations to programmatic API in core so modifications in core are inevitable. You cannot use both annotations and programmatic API.

Ah, okay, I missed this point… that seems to make this a rather big initiative, replacing all the entire annotation approach in core with a programmatic approach… ? Or am I missing something?

There’s not that many Hibernate Search annotations yet as we just store concepts and patients… + it’s straightforward to convert. I wouldn’t say it’s a big initiative.

Ah, got it, @raff… thanks. So I guess not too big in terms of work, but just something that would need need to be carefully tested and isn’t backportable since it affects a pretty core piece of code.

@raff @mogoodrich I’ve got Lucene AnalyzerDefs being provided programmatically with a factory class! See Jira ticket, PR.

However, I haven’t succeeded in overriding it from a module. Here’s my attempt (no, I’ve never really worked with Spring or Hibernate before). Is this the right way to go about providing custom analyzer definitions?

@bistenes I made a few quick comments on some of the pull requests, but happy to help with some ideas around this. I don’t know if the factory itself can be configured as a Spring bean, but it could certainly delegate to a Spring Bean that implements an interface, and we could set things up such that modules can override that by adding a Spring bean that implements the same interface but has a lower sort order. Happy to discuss and pair on it as needed.